Update/calculate hourly airqualitydata using bigqdata #4482

NicholasTurner23 · 2025-02-25T11:58:55Z

Description

Just some cleanup

Related Issues

JIRA cards:
- OPS-355

Summary by CodeRabbit

New Features
- Enhanced calibration now processes data based on city-level grouping for more localized outcomes.
- A new configuration option allows the selection between different calibration models, offering greater flexibility in data processing.

coderabbitai · 2025-02-25T11:59:07Z

📝 Walkthrough

Walkthrough

The changes update the calibration flow in the AirQo ETL utilities to support dynamic model selection based on the grouping criterion. The calibrate_data function now directly accesses the timestamp property, employs a new dictionary for mapping the groupby value to the appropriate model (either CityModels or CountryModels), and uses updated variable names. Additionally, a new environment variable (CALIBRATEBY) is added, the CityModel class is renamed to CityModels, and the DAGs now pass "city" as the grouping parameter rather than "country".

Changes

File(s)	Change Summary
`src/workflows/airqo_etl_utils/airqo_utils.py`	Updated `calibrate_data`: simplified timestamp access, introduced `calibrate_by` dictionary for dynamic model selection, updated variable from `country` to `groupedby`.
`src/workflows/airqo_etl_utils/config.py`	Added new environment variable `CALIBRATEBY` to the `Config` class for configuration expansion.
`src/workflows/airqo_etl_utils/constants.py`	Renamed `CityModel` to `CityModels` without altering its internals.
`src/workflows/airqo_etl_utils/utils.py`	Modified `get_calibration_model_path` to support both `CountryModels` and `CityModels`; adjusted type imports and conditionals accordingly.
`src/workflows/dags/airqo_measurements.py`, `src/workflows/dags/airqo_mobile_measurements.py`	Changed the `groupby` parameter in `calibrate_data` calls from `"country"` to `"city"`, shifting data grouping granularity.

Sequence Diagram(s)

sequenceDiagram
    participant DAG as DAG/Workflow
    participant Utils as AirQoDataUtils
    participant Calibrator as calibrate_data
    participant Selector as Model Selector
    participant ModelPath as Model Path Resolver

    DAG->>Utils: Call calibrate_data(data, groupby: "city")
    Utils->>Calibrator: Pass calibration parameters
    Calibrator->>Selector: Lookup model using calibrate_by[groupby]
    Selector-->>Calibrator: Return appropriate model (CityModels/CountryModels)
    Calibrator->>ModelPath: Retrieve calibration model file path (pollutant)
    ModelPath-->>Calibrator: Return model path
    Calibrator->>Calibrator: Process grouped data & apply predictions
    Calibrator-->>Utils: Return calibrated data
    Utils-->>DAG: Deliver calibrated measurements

Possibly related PRs

Update/calculate hourly airqualitydata using bigqdata #4402: Modifications in the calibrate_data function and model handling logic directly align with the current changes.
Update fix/consolidated data cleanup #4203: Updates to both calibrate_data and get_calibration_model_path methods for dynamic model selection are closely related.
Update/calculate hourly airqualitydata using bigqdata #4478: Adjustments to the groupby parameter handling in the calibration process mirror the changes implemented here.

Suggested reviewers

Baalmart

Poem

In the realm of code where functions gleam,
Calibration shifts like a well-tuned dream.
City or country, the models align,
With a simple tweak, the logic refines.
Environment set, and workflows sing— 🚀
A small change that makes our code take wing!
Happy coding in a world so slick!

Warning

Review ran into problems

🔥 Problems

Errors were encountered while retrieving linked issues.

Errors (1)

JIRA integration encountered authorization issues. Please disconnect and reconnect the integration in the CodeRabbit UI.

✨ Finishing Touches

📝 Generate Docstrings (Beta)

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 0

🔭 Outside diff range comments (1)

src/workflows/airqo_etl_utils/airqo_utils.py (1)
840-840: 💡 Verification agent
🧩 Analysis chain

Function call needs to be updated with the new groupby parameter.

The calibrate_data function is called with "country" as the groupby parameter, but according to the AI summary, DAGs now pass "city" as the grouping parameter. This call should be updated for consistency.

🏁 Script executed:
#!/bin/bash
# Check if DAGs are passing "city" as the groupby parameter to calibrate_data
rg -A 2 $'AirQoDataUtils.calibrate_data' src/workflows/dags/
Length of output: 804
Action Required: Update the groupby argument in the calibrate_data call

The DAGs are consistently passing "city" for the groupby parameter, but in src/workflows/airqo_etl_utils/airqo_utils.py (line 840) the function is still being called with "country". Please update this function call to use "city" to maintain consistency across the workflows.

File: src/workflows/airqo_etl_utils/airqo_utils.py

Line: 840 – change from groupby="country" to groupby="city"

🧹 Nitpick comments (1)

src/workflows/airqo_etl_utils/utils.py (1)
327-329: Simplify multiple isinstance checks.

The code can be made more readable and efficient by combining the multiple isinstance checks into a single operation.
-if isinstance(calibrateby, CountryModels) or isinstance(
-    calibrateby, CityModels
-):
+if isinstance(calibrateby, (CountryModels, CityModels)):
🧰 Tools

🪛 Ruff (0.8.2)

327-329: Multiple isinstance calls for calibrateby, merge into a single call

Merge isinstance calls for calibrateby

(SIM101)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 328861f and 154595c.

📒 Files selected for processing (6)

src/workflows/airqo_etl_utils/airqo_utils.py (4 hunks)
src/workflows/airqo_etl_utils/config.py (1 hunks)
src/workflows/airqo_etl_utils/constants.py (1 hunks)
src/workflows/airqo_etl_utils/utils.py (2 hunks)
src/workflows/dags/airqo_measurements.py (2 hunks)
src/workflows/dags/airqo_mobile_measurements.py (1 hunks)

🧰 Additional context used

🪛 Ruff (0.8.2)

src/workflows/airqo_etl_utils/utils.py

327-329: Multiple isinstance calls for calibrateby, merge into a single call

Merge isinstance calls for calibrateby

(SIM101)

⏰ Context from checks skipped due to timeout of 90000ms (2)

GitHub Check: Analyze (python)
GitHub Check: Analyze (javascript)

🔇 Additional comments (17)

src/workflows/airqo_etl_utils/config.py (1)

629-629: New configuration variable for calibration.

This addition of the CALIBRATEBY environment variable enhances the configuration flexibility for the calibration process. It aligns with the changes in the calibration workflow that now supports dynamic model selection.

src/workflows/airqo_etl_utils/constants.py (1)

261-261: Class name update for consistency.

The renaming from CityModel to CityModels improves naming consistency, particularly aligning with the plural form used in CountryModels. This change makes the API more intuitive and maintains a uniform naming convention.
src/workflows/dags/airqo_mobile_measurements.py (1)

63-63:
❓ Verification inconclusive

Calibration grouping parameter update.

The change from grouping by "country" to grouping by "city" provides finer-grained calibration control. This aligns with the renamed CityModels class and the new CALIBRATEBY configuration option, enabling more localized calibration models.

🏁 Script executed:
#!/bin/bash
# Verify if there are any other usages of country-based calibration that might need updating
rg "calibrate_data\(.*groupby=\"country\"" --type py
Length of output: 52
Calibration Grouping Parameter Change – Manual Verification Recommended

The update from grouping by "country" to "city" in the call to AirQoDataUtils.calibrate_data aligns with the new CityModels naming and the CALIBRATEBY configuration.

A search for usages of calibrate_data(…) with groupby="country" returned no instances, suggesting that the transition has been applied consistently within the Python files. However, since the initial search output was empty, please perform a manual check or run additional searches to confirm there are no overlooked instances.

Affected Code:
File: src/workflows/dags/airqo_mobile_measurements.py
Line 63:
return AirQoDataUtils.calibrate_data(data=data, groupby="city")
Please ensure downstream components expecting the old grouping do not break.
src/workflows/dags/airqo_measurements.py (2)

108-108: Calibration grouping parameter update.

Changed from grouping by "country" to grouping by "city" for more precise calibration in the historical hourly measurements workflow. This provides finer-grained control over the calibration models used.

419-419: Calibration grouping parameter update.

Changed from grouping by "country" to grouping by "city" for more precise calibration in the realtime measurements workflow. This ensures consistency with the historical hourly calibration approach.

src/workflows/airqo_etl_utils/utils.py (2)

18-18: Adding CityModels import aligns with the new functionality.

The addition of CityModels to the imports is consistent with the changes made to support dynamic model selection based on grouping criterion.

22-22: Simplified import statement is more focused.

Removing Optional from the typing import keeps the imports concise as it appears not to be used in this file.

src/workflows/airqo_etl_utils/airqo_utils.py (10)

15-15: Import name change from CityModel to CityModels.

The import statement now includes CityModels instead of CityModel, which aligns with the class name change mentioned in the AI summary. This ensures consistency across the codebase.

626-626: Direct attribute access improves code clarity.

The change to directly access the hour attribute from the timestamp object (data["timestamp"].dt.hour) instead of using __getattribute__ is a good improvement. This makes the code more readable and follows pandas best practices.

640-643: Well-structured dictionary for dynamic model selection.

The new calibrate_by dictionary provides a clean way to map the groupby parameter to the appropriate model type. This makes the code more maintainable and aligns with the PR's goal of supporting dynamic model selection.

645-647: Robust model selection with fallback to default.

This change enables dynamic model selection based on the groupby parameter with a proper fallback to CountryModels if the provided parameter doesn't match any known keys in the dictionary. This makes the code more resilient to unexpected inputs.

658-659: Model path retrieval now uses the dynamic model approach.

Updated to use the dynamically selected model (via the model variable) rather than hardcoding CountryModels. This makes the code more flexible and consistent with the design changes.

663-664: Consistent use of dynamic model selection.

Similar to the previous change, this update ensures that the default lasso model path is retrieved using the dynamically selected model type, maintaining consistency throughout the function.

666-666: Dynamic model enumeration based on selected model type.

Retrieving available models from the dynamically selected model type rather than hardcoding to CountryModels enhances flexibility and aligns with the goal of supporting multiple grouping criteria.

668-668: Variable name change enhances code readability.

Changing the variable name from country to groupedby better reflects the more general purpose of the variable, which can now represent either a country or a city based on the grouping criterion. This makes the code more self-documenting.

670-670: Consistent handling of dynamic model selection.

The condition for selecting a custom model now checks if the groupedby value exists in the available models for the dynamically selected model type, rather than hardcoding to CountryModels. This ensures consistent behavior across different grouping criteria.

676-677: Consistent use of lowercase for model path.

The use of groupedby.lower() for model path retrieval ensures that the function handles variations in case correctly, improving robustness of the code.

Also applies to: 683-684

NicholasTurner23 added 2 commits February 25, 2025 14:56

Make calibration model selection more dynamic

98a0b42

Updates from airqstaging staging

154595c

NicholasTurner23 requested a review from Baalmart February 25, 2025 12:01

coderabbitai bot reviewed Feb 25, 2025

View reviewed changes

Baalmart merged commit ff219d9 into airqo-platform:staging Feb 25, 2025
46 checks passed

Baalmart mentioned this pull request Feb 25, 2025

move to production #4483

Merged

1 task

coderabbitai bot mentioned this pull request Feb 25, 2025

Update/calculate hourly airqualitydata using bigqdata #4484

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update/calculate hourly airqualitydata using bigqdata #4482

Update/calculate hourly airqualitydata using bigqdata #4482

NicholasTurner23 commented Feb 25, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 25, 2025 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Possibly related PRs

Suggested reviewers

Poem

Review ran into problems

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

coderabbitai bot left a comment

Update/calculate hourly airqualitydata using bigqdata #4482

Update/calculate hourly airqualitydata using bigqdata #4482

Conversation

NicholasTurner23 commented Feb 25, 2025 • edited by coderabbitai bot Loading

Description

Related Issues

Summary by CodeRabbit

coderabbitai bot commented Feb 25, 2025 • edited Loading

Walkthrough

Changes

Sequence Diagram(s)

Possibly related PRs

Suggested reviewers

Poem

Review ran into problems

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

NicholasTurner23 commented Feb 25, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 25, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)