-
-
Notifications
You must be signed in to change notification settings - Fork 5
Add Cercarbono & Isometric projects and credits processing #138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
andersy005
wants to merge
34
commits into
main
Choose a base branch
from
add-Cercarbano
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 19 commits
Commits
Show all changes
34 commits
Select commit
Hold shift + click to select a range
3a02bef
Add Cercarbono project processing and update raw columns mapping
andersy005 47829e2
Merge branch 'main' into add-Cercarbano
andersy005 95d9a1a
Update Cercarbono mappings in projects-raw-columns-mapping.json
andersy005 dbbc2c6
Add method to generate project URLs for Cercarbono projects
andersy005 8a06d33
Add processing method for Cercarbono transactions and update column m…
andersy005 7a1dcc4
Update transaction date conversion to use ISO8601 format
andersy005 a05fc97
Extract vintage year from vintage_of_credits in process_cercarbono_tr…
andersy005 328d074
Add missing columns handling in process_cercarbono_transactions
andersy005 2c2df90
Refactor process_cercarbono_projects to accept credits DataFrame and …
andersy005 3a36872
Remove unnecessary parameter from process_vcs_projects calls in tests
andersy005 3f6029a
Add process_isometric_projects function to handle Isometric project data
andersy005 121a275
Add isometric project mappings to projects-raw-columns-mapping.json
andersy005 e8d93cc
Add project URL handling and enhance isometric project processing
andersy005 53b461e
Rename process_cercarbono_transactions to process_cercarbono_credits …
andersy005 a7a7540
Enhance process_isometric_credits function to include datetime conver…
andersy005 156694b
Add project ID and vintage year extraction to process_isometric_credi…
andersy005 31b6cb4
Change integer columns to Float32 in project_schema and credit_withou…
andersy005 cf5ca9c
Uncomment methods to add retired and issued totals, and first issuanc…
andersy005 783b1d2
Refactor process_isometric_credits function to handle transaction typ…
andersy005 eaa2599
Add 'isometric' and 'cercarbono' to registry abbreviation mapping
andersy005 04072d5
Update project_id mapping in cercarbono retirements and remove redund…
andersy005 d97f43b
Add project ID methods for Cercarbono and Isometric credits dataframe…
andersy005 5efeea9
Fix project ID assignment order in process_cercarbono_projects and up…
andersy005 0b6d441
Refactor process_cercarbono_credits to streamline data handling for i…
andersy005 3966307
Merge branch 'main' into add-Cercarbano
andersy005 ad1804a
Enhance process_isometric_credits to support project ID mapping with …
andersy005 d2bc9a2
Add harmonization option for beneficiary data in process functions
andersy005 62153ae
Refactor process_isometric_credits to improve flow and readability by…
andersy005 75a1c69
Merge branch 'main' into add-Cercarbano
andersy005 1afb6e8
Retrigger CI
andersy005 0c1c6d9
Merge branch 'main' into add-Cercarbano
andersy005 adcfc70
Refactor import statements for pandera to use pandas submodule
andersy005 9816997
Add new project types and update isometric project type inference logic
andersy005 99b03bf
Add Cercarbono project type inference and update protocol mapping
andersy005 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,168 @@ | ||
| import pandas as pd | ||
| import pandas_flavor as pf | ||
|
|
||
| from offsets_db_data.common import ( | ||
| BERKELEY_PROJECT_TYPE_UPATH, | ||
| CREDIT_SCHEMA_UPATH, | ||
| PROJECT_SCHEMA_UPATH, | ||
| load_column_mapping, | ||
| load_inverted_protocol_mapping, | ||
| load_registry_project_column_mapping, | ||
| load_type_category_mapping, | ||
| ) | ||
| from offsets_db_data.credits import ( | ||
| aggregate_issuance_transactions, # noqa: F401 | ||
| filter_and_merge_transactions, # noqa: F401 | ||
| merge_with_arb, # noqa: F401 | ||
| ) | ||
| from offsets_db_data.models import credit_without_id_schema, project_schema | ||
| from offsets_db_data.projects import ( | ||
| add_category, # noqa: F401 | ||
| add_first_issuance_and_retirement_dates, # noqa: F401 | ||
| add_is_compliance_flag, # noqa: F401 | ||
| add_retired_and_issued_totals, # noqa: F401 | ||
| harmonize_country_names, # noqa: F401 | ||
| harmonize_status_codes, # noqa: F401 | ||
| map_protocol, # noqa: F401 | ||
| ) | ||
|
|
||
|
|
||
| @pf.register_dataframe_method | ||
| def add_cercarbono_project_url(df: pd.DataFrame) -> pd.DataFrame: | ||
| """Add project URL column for Cercarbono projects. | ||
|
|
||
| Parameters | ||
| ---------- | ||
| df : pd.DataFrame | ||
| Input dataframe containing Cercarbono project data. | ||
|
|
||
| Returns | ||
| ------- | ||
| pd.DataFrame | ||
| Dataframe with added project URL column. | ||
| """ | ||
| base_url = 'https://www.ecoregistry.io/projects' | ||
| df['project_url'] = df['project_id'].apply(lambda x: f'{base_url}/{x}') | ||
| return df | ||
|
|
||
|
|
||
| @pf.register_dataframe_method | ||
| def process_cercarbono_credits( | ||
| projects: pd.DataFrame, | ||
| retirements: pd.DataFrame, | ||
| download_type: str = 'retirements', | ||
| registry_name: str = 'cercarbono', | ||
| ) -> pd.DataFrame: | ||
| """Process Cercarbono transactions dataframe to conform to offsets-db schema. | ||
|
|
||
| Parameters | ||
| ---------- | ||
| projects : pd.DataFrame | ||
| Input dataframe containing Cercarbono project data. | ||
| retirements : pd.DataFrame | ||
| Input dataframe containing Cercarbono retirement data. | ||
| download_type : str, optional | ||
| Type of data to download, by default "retirements" | ||
| registry_name : str, optional | ||
| Name of the registry to be added to the dataframe, by default "cercarbono" | ||
|
|
||
| Returns | ||
| ------- | ||
| pd.DataFrame | ||
| Processed dataframe conforming to offsets-db schema. | ||
| """ | ||
| all_issuances = [] | ||
| for idx, row in projects.iterrows(): | ||
| issuances = row['serials'] | ||
| for issuance in issuances: | ||
| issuance['project_id'] = row['code'] | ||
| issuance['name'] = row['name'] | ||
| all_issuances.extend(issuances) | ||
|
|
||
| issuances = pd.json_normalize(all_issuances).rename( | ||
| columns={'issued_quantity': 'quantity', 'issuance_date': 'date'} | ||
| ) | ||
| # Extract vintage year from the last date in vintage_of_credits (format: "YYYY-MM-DD / YYYY-MM-DD") | ||
| # TODO: @badgley, please confirm this is the correct way to extract vintage year for issuances | ||
| issuances['vintage'] = ( | ||
| issuances['vintage_of_credits'].str.split(' / ').str[-1].str[:4].astype(int) | ||
| ) | ||
| issuances['transaction_type'] = 'issuance' | ||
| # add CDC- prefix to project IDs | ||
| retirements['project_id'] = retirements['project_id'].apply(lambda x: f'CDC-{x}') | ||
| retirements['transaction_type'] = 'retirement' | ||
|
|
||
| column_mapping = load_column_mapping( | ||
| registry_name=registry_name, download_type=download_type, mapping_path=CREDIT_SCHEMA_UPATH | ||
| ) | ||
|
|
||
| columns = {v: k for k, v in column_mapping.items()} | ||
|
|
||
| df = pd.concat([issuances, retirements]).reset_index(drop=True).rename(columns=columns) | ||
| data = ( | ||
| df.set_registry(registry_name=registry_name) | ||
| .convert_to_datetime(columns=['transaction_date'], format='ISO8601') | ||
| .add_missing_columns(schema=credit_without_id_schema) | ||
| .validate(schema=credit_without_id_schema) | ||
| ) | ||
| return data | ||
|
|
||
|
|
||
| @pf.register_dataframe_method | ||
| def process_cercarbono_projects( | ||
| df: pd.DataFrame, | ||
| *, | ||
| credits: pd.DataFrame, | ||
| registry_name: str = 'cercarbono', | ||
| ) -> pd.DataFrame: | ||
| """Process Cercarbono projects dataframe to conform to offsets-db schema. | ||
|
|
||
| Parameters | ||
| ---------- | ||
| df : pd.DataFrame | ||
| Input dataframe containing Cercarbono project data. | ||
| registry_name : str, optional | ||
| Name of the registry to be added to the dataframe, by default "cercarbon | ||
|
|
||
|
|
||
| Returns | ||
| ------- | ||
| pd.DataFrame | ||
| Processed dataframe conforming to offsets-db schema. | ||
| """ | ||
|
|
||
| registry_project_column_mapping = load_registry_project_column_mapping( | ||
| registry_name=registry_name, file_path=PROJECT_SCHEMA_UPATH | ||
| ) | ||
| inverted_column_mapping = {value: key for key, value in registry_project_column_mapping.items()} | ||
| type_category_mapping = load_type_category_mapping() | ||
| inverted_protocol_mapping = load_inverted_protocol_mapping() | ||
| df = df.copy() | ||
| df['country'] = df.locations.map( | ||
| lambda x: x[0]['country'] | ||
| ) # extract country from locations by taking first entry | ||
|
|
||
| data = ( | ||
| df.rename(columns=inverted_column_mapping) | ||
| .set_registry(registry_name=registry_name) | ||
| .add_cercarbono_project_url() | ||
| .harmonize_country_names() | ||
| .harmonize_status_codes() | ||
| .map_protocol(inverted_protocol_mapping=inverted_protocol_mapping) | ||
| .infer_project_type() | ||
| .override_project_types( | ||
| override_data_path=BERKELEY_PROJECT_TYPE_UPATH, source_str='berkeley' | ||
| ) | ||
| .add_category( | ||
| type_category_mapping=type_category_mapping | ||
| ) # must come after types; type -> category | ||
| .map_project_type_to_display_name(type_category_mapping=type_category_mapping) | ||
| .add_is_compliance_flag() | ||
| .add_retired_and_issued_totals(credits=credits) | ||
| .add_first_issuance_and_retirement_dates(credits=credits) | ||
| .add_missing_columns(schema=project_schema) | ||
| .convert_to_datetime(columns=['listed_at', 'first_issuance_at', 'first_retirement_at']) | ||
| .validate(schema=project_schema) | ||
| ) | ||
|
|
||
| return data | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @badgley for feedback