Automated data pipeline for CSRD-compliant Scope 3 GHG reporting.
Ingests supplier emission data from multiple sources, validates against GHG Protocol methodology, computes category totals, and outputs audit-ready reports.
Raw data sources Pipeline Output
───────────────── ────────────────────── ──────────────────
Supplier surveys → Ingestion & validation → Category totals
ERP / SAP exports → Emission factor mapping → Data quality score
Spend data → Scope 3 calculation → CSRD-ready report
Energy invoices → Aggregation by category → Audit trail
| Category | Method | Status |
|---|---|---|
| Cat. 1 — Purchased goods | Spend-based + primary data | ✅ |
| Cat. 3 — Fuel & energy | Location-based | ✅ |
| Cat. 4 — Upstream transport | Distance-based | ✅ |
| Cat. 6 — Business travel | Distance-based | ✅ |
| Cat. 11 — Use of sold products | Product lifetime model | 🔄 WIP |
| Cat. 15 — Investments (PCAF) | Attribution factor model | 🔄 WIP |
Python 3.11 pandas · pydantic · pytest
BigQuery data warehouse + dbt models
FastAPI validation API endpoint
Docker containerized pipeline
GCP Cloud Run scheduled execution
esg-scope3-pipeline/
├── pipeline/
│ ├── ingestion/ # Source connectors (CSV, API, BigQuery)
│ ├── validation/ # Data quality checks (Pydantic models)
│ ├── calculation/ # Emission factor application
│ ├── aggregation/ # Category totals
│ └── output/ # Report generation
├── emission_factors/ # GHG Protocol & ADEME factor tables
├── dbt/ # dbt models for BigQuery transformations
├── tests/ # pytest test suite
├── api/ # FastAPI validation endpoint
└── notebooks/ # Exploratory analysis
git clone https://github.com/Martsk23/esg-scope3-pipeline
cd esg-scope3-pipeline
pip install -r requirements.txt
# Run on sample data
python -m pipeline.main --input data/sample_suppliers.csv --output reports/
# Run tests
pytest tests/ -vFollows GHG Protocol Corporate Value Chain (Scope 3) Standard. Emission factors from:
- ADEME Base Carbone (France)
- IPCC AR6 global warming potentials
- EXIOBASE for spend-based estimation
Data quality scored using PCAF Data Quality Score framework (levels 1–5).
Built on 18 months of ESG audit consulting at BM&A, working on CSRD readiness for CAC 40 and mid-cap clients. This pipeline automates the data collection and calculation layer that is typically done manually in Excel.