|
| 1 | +# Credits |
| 2 | +The `credits` data reports bulk credit transactions: issuances, retirements, and cancellations. |
| 3 | +We first download raw credit transaction data from each of the registries. |
| 4 | +We then apply custom, registry-specific transformations to the data, with the goal of mapping all registry data to a common schema. |
| 5 | + |
| 6 | +## Schema |
| 7 | + |
| 8 | +Credit transactions have the following schema: |
| 9 | + |
| 10 | +```json |
| 11 | +{ |
| 12 | + 'title': 'Credit', |
| 13 | + 'properties': { |
| 14 | + 'id': { |
| 15 | + 'title': 'Id', |
| 16 | + 'type': 'integer' |
| 17 | + }, |
| 18 | + 'project_id': { |
| 19 | + 'title': 'Project ID', |
| 20 | + 'description': 'Unique project identifier, by registry', |
| 21 | + 'type': 'string' |
| 22 | + }, |
| 23 | + 'quantity': { |
| 24 | + 'title': 'Quantity', |
| 25 | + 'description': 'Number of credits', |
| 26 | + 'type': 'integer' |
| 27 | + }, |
| 28 | + 'vintage': { |
| 29 | + 'title': 'Vintage', |
| 30 | + 'description': 'Vintage year of credits', |
| 31 | + 'type': 'integer' |
| 32 | + }, |
| 33 | + 'transaction_date': { |
| 34 | + 'title': 'Transaction Date', |
| 35 | + 'description': 'Date of transaction', |
| 36 | + 'type': 'string', |
| 37 | + 'format': 'date' |
| 38 | + }, |
| 39 | + 'transaction_type': { |
| 40 | + 'title': 'Transaction Type', |
| 41 | + 'description': 'Type of transaction (i.e., issuance, retirement)', |
| 42 | + 'type': 'string' |
| 43 | + } |
| 44 | + } |
| 45 | +} |
| 46 | +``` |
| 47 | +## Downloading raw data |
| 48 | +We download a fresh copy of project and transaction data on a daily basis. |
| 49 | +While downloading, we make no changes to the raw data provided by the registries. |
| 50 | +All data are permanently archived and are made immediately available for download in a publicly available S3 bucket (see Data Access TK). |
| 51 | + |
| 52 | +As with `projects` data, we have no plans to release the code the directly interacts with the registries. |
| 53 | +We made this decision to keep this part of OffsetsDB closed in an effort to limit download requests from the registries. |
| 54 | + |
| 55 | +## Transforming raw data |
| 56 | + |
| 57 | +Nearly the entirety of the code contained within `offsets-db-data` involves registry-specific logic for transforming raw registry data into a common, shared schema. |
| 58 | +The logic for transforming the data of each registry is contained within a single file and is denoted by the filename. |
| 59 | +For example, the logic involved in transforming Verra data are contained within a file named `vcs.py`. |
| 60 | + |
| 61 | +Each registry-specific file contains at least two functions: `process_{registry_abbreviation}_credits` and `process_{registry_abbreviation}_projects` |
| 62 | +Those functions, in turn, call a series of additional transformation functions that produce the normalized project and credit data which combine to form OffsetsDB. |
| 63 | +These transformation functions tend to be quite small and operate on one or two properties of the raw data. |
| 64 | +To continue with the Verra example, `vcs.py` contains functions with names like `set_vcs_vintage_year` and `generate_vcs_project_ids`. |
| 65 | +These functions contain the registry-specific logic needed to map Verra's raw data to a common schema. |
| 66 | + |
| 67 | +### An example |
| 68 | +In practice, replicating the behavior of OffsetsDB should be simple. |
| 69 | +Here's an example of using `offsets_db_download` to transform the raw transactions data from Verra into a normalized, analysis ready file: |
| 70 | + |
| 71 | +```python |
| 72 | +import pandas as pd |
| 73 | +from offsets_db_download import vcs |
| 74 | + |
| 75 | +archive_fname = 's3://carbonplan-offsets-db/raw/2023-12-05/verra/transactions.csv.gz' |
| 76 | +raw_credits = pd.read_csv(archive_fname) |
| 77 | +processed_credits = vcs.process_vcs_credits(raw_credits) |
| 78 | +``` |
| 79 | + |
| 80 | +Invoking single transformation functions, like `set_vcs_vintage_year` is even more straightforward. |
| 81 | +Let's say you want to understand more about how OffsetsDB assigns Verra credits a vintage year. |
| 82 | +You can explore the behavior of this single transformation function by calling: |
| 83 | + |
| 84 | +```python |
| 85 | +raw_credits.set_vcs_vintage_year(date_column='Vintage End') |
| 86 | +``` |
| 87 | + |
| 88 | +It's worth noting that we've wrapped all transformation functions using the `pandas_flavor.register_dataframe_method` decorator. |
| 89 | +That means that after importing a registry module from `offsets_db_download`, the transformation functions of that module are directly callable by any Pandas dataframe. |
| 90 | + |
| 91 | +## Initial Column Mapping |
| 92 | +The initial and perhaps must mundane transformation of OffsetsDB involves mapping properties in the raw data to a common schema. |
| 93 | +This step requires constructing a map between the names of properties as they appear in the raw data to the property in OffsetsDB. |
| 94 | +For example, the Climate Action Reserve data refers to the property, `project_id`, as `Project ID`. |
| 95 | +The ART registry, however, refers to the same property as `Program ID`. |
| 96 | + |
| 97 | +These column mapping files are stored in `offsets_db_data/configs`. |
| 98 | +There is a separate mapping file for `projects` data and `credits` data. |
| 99 | +Some properties either aren't included in the raw data or inferring their value requires special processing. |
| 100 | +In these cases, a `null` value is recorded in the column mapping files. |
| 101 | + |
| 102 | +## Protocol Mapping \& Categorization |
| 103 | +Offset projects are developed by following a specific set of rules, known as a protocol. |
| 104 | +Unfortunately, there is no standardized way of referring to the exact protocol (or protocol version) used to develop an offset project. |
| 105 | +Even within the domain of a single registry, references to the exact protocol used to develop a project are often inconsistent. |
| 106 | + |
| 107 | +OffsetsDB addresses this problem by manually assigning every known protocol string to a common schema. |
| 108 | +Take for example the Clean Development Mechanism protocol AMS-III.D., "Methane recovery in animal manure management systems". |
| 109 | +Across all six registries included in OffsetsDB, we identified twenty-two unique strings referring to this single protocol. |
| 110 | +OffsetsDB maps these unique strings, which we refer to as "known strings" to a single reference, `ams-iii-d`. |
| 111 | + |
| 112 | +We also assign each of these unified protocol references a category. |
| 113 | +Those categories include: |
| 114 | + |
| 115 | +- agriculture: offsets derived from the management of farmlands |
| 116 | +- cookstoves: offsets derived from in-home cookstoves that are either more efficient or use cleaner fuels |
| 117 | +- forest: offsets derived from the management of forests |
| 118 | +- ghg-management: offsets derived from the destruction or elimination of greenhouse gases |
| 119 | +- land-use: offsets derived from changes in land-use (e.g., avoided conversion) |
| 120 | +- renewable-energy: offsets derived from expanding renewable energy capacity |
| 121 | + |
| 122 | +Data about protocol categories and "known strings" are stored in `offsets_db_data/configs/all-protocol-mapping.json`. |
| 123 | + |
| 124 | +## Registry specific transformations |
| 125 | +Some of the transformations involved in producing OffsetsDB require special knowledge or assumptions about the underlying data. |
| 126 | +This section of the documentation highlights some of those special cases. |
| 127 | +For additional context, consult specific function docstrings or reach out TK if something doesn't make sense. |
| 128 | + |
| 129 | +### American Carbon Registry |
| 130 | + |
| 131 | +Project status: When processing ACR projects, we combine two status properties present in the raw data: `Compliance Program Status (ARB or Ecology)` and `Voluntary Status`. |
| 132 | +For compliance projects, we report compliance program status. |
| 133 | +For non-compliance projects, we report voluntary status. |
| 134 | + |
| 135 | +### Verra |
| 136 | +There are several unique aspects of Verra's crediting data that require special consideration. |
| 137 | +First, erra is unique amongst the registries included in OffsetsDB in that Verra allows for "rolling" credit issuance. |
| 138 | +This allows projects to complete the paperwork and verificaiton processes for credit issuance, but delay the actual issuance event. |
| 139 | +This results in ambiguities around the precise timing of credit issuance events, as credits that are eligible to be issued but have not yet been issued, are not publicly reported in the Verra crediting data. |
| 140 | +We handle this ambiguity by assuming that the first crediting event, be it an issuance, retirement, or cancellation, on a per-project, per-vintage basis results in issuance of 100 percent of credits eligible to be issued for that project-vintage. |
| 141 | +Second, Verra's data does not allow the distinction of retirement events from cancellation events. |
| 142 | +We report all Verra retirements and cancellations as `retirement/cancellation`. |
| 143 | +Third, Verra allows for the simultaneous issuance of multiple vintages. |
| 144 | +We assign all credits from these multi-vintage issuances to the first reported vintage year. |
| 145 | + |
| 146 | +### California Compliance Projects |
| 147 | +We treat the California Air Resources Board's [issuance table](https://ww2.arb.ca.gov/resources/documents/arb-offset-credit-issuance-table) as the source of truth for all credits issued and retired by any project developed under an ARB-approved protocol. |
| 148 | +``` |
0 commit comments