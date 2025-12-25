SCANOSS Geo Provenance Dataset Description

SCANOSS Geo Provenance Dataset is a software composition analysis tool that identifies the geographic and authorial origin of open source code components within a codebase. The tool uses commit history, code fingerprinting, and author attribution models to provide provenance insights for software supply chain transparency. The dataset provides both declared origins, where authorship or source information is explicitly stated, and inferred origins, which are identified through analysis of code patterns, metadata, and historical context when explicit information is unavailable. The tool operates through a three-step process: scanning code repositories with SCANOSS, extracting author and location metadata from code and commit history, and retrieving geographic provenance insights via API. This information helps organizations validate software supply chain integrity and identify code of unknown or high-risk provenance. The solution addresses regulatory compliance requirements by providing visibility into where code originates and who authored it. Use cases include open source software license compliance and managing open source software in AI-generated code.