-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update/calculate hourly airqualitydata using bigqdata #4482
Update/calculate hourly airqualitydata using bigqdata #4482
Conversation
📝 WalkthroughWalkthroughThe changes update the calibration flow in the AirQo ETL utilities to support dynamic model selection based on the grouping criterion. The Changes
Sequence Diagram(s)sequenceDiagram
participant DAG as DAG/Workflow
participant Utils as AirQoDataUtils
participant Calibrator as calibrate_data
participant Selector as Model Selector
participant ModelPath as Model Path Resolver
DAG->>Utils: Call calibrate_data(data, groupby: "city")
Utils->>Calibrator: Pass calibration parameters
Calibrator->>Selector: Lookup model using calibrate_by[groupby]
Selector-->>Calibrator: Return appropriate model (CityModels/CountryModels)
Calibrator->>ModelPath: Retrieve calibration model file path (pollutant)
ModelPath-->>Calibrator: Return model path
Calibrator->>Calibrator: Process grouped data & apply predictions
Calibrator-->>Utils: Return calibrated data
Utils-->>DAG: Deliver calibrated measurements
Possibly related PRs
Suggested reviewers
Poem
Warning Review ran into problems🔥 ProblemsErrors were encountered while retrieving linked issues. Errors (1)
✨ Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🔭 Outside diff range comments (1)
src/workflows/airqo_etl_utils/airqo_utils.py (1)
840-840
: 💡 Verification agent🧩 Analysis chain
Function call needs to be updated with the new groupby parameter.
The
calibrate_data
function is called with "country" as thegroupby
parameter, but according to the AI summary, DAGs now pass "city" as the grouping parameter. This call should be updated for consistency.
🏁 Script executed:
#!/bin/bash # Check if DAGs are passing "city" as the groupby parameter to calibrate_data rg -A 2 $'AirQoDataUtils.calibrate_data' src/workflows/dags/Length of output: 804
Action Required: Update the groupby argument in the calibrate_data call
The DAGs are consistently passing
"city"
for thegroupby
parameter, but insrc/workflows/airqo_etl_utils/airqo_utils.py
(line 840) the function is still being called with"country"
. Please update this function call to use"city"
to maintain consistency across the workflows.
- File:
src/workflows/airqo_etl_utils/airqo_utils.py
- Line: 840 – change from
groupby="country"
togroupby="city"
🧹 Nitpick comments (1)
src/workflows/airqo_etl_utils/utils.py (1)
327-329
: Simplify multiple isinstance checks.The code can be made more readable and efficient by combining the multiple
isinstance
checks into a single operation.-if isinstance(calibrateby, CountryModels) or isinstance( - calibrateby, CityModels -): +if isinstance(calibrateby, (CountryModels, CityModels)):🧰 Tools
🪛 Ruff (0.8.2)
327-329: Multiple
isinstance
calls forcalibrateby
, merge into a single callMerge
isinstance
calls forcalibrateby
(SIM101)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
src/workflows/airqo_etl_utils/airqo_utils.py
(4 hunks)src/workflows/airqo_etl_utils/config.py
(1 hunks)src/workflows/airqo_etl_utils/constants.py
(1 hunks)src/workflows/airqo_etl_utils/utils.py
(2 hunks)src/workflows/dags/airqo_measurements.py
(2 hunks)src/workflows/dags/airqo_mobile_measurements.py
(1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
src/workflows/airqo_etl_utils/utils.py
327-329: Multiple isinstance
calls for calibrateby
, merge into a single call
Merge isinstance
calls for calibrateby
(SIM101)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: Analyze (python)
- GitHub Check: Analyze (javascript)
🔇 Additional comments (17)
src/workflows/airqo_etl_utils/config.py (1)
629-629
: New configuration variable for calibration.This addition of the
CALIBRATEBY
environment variable enhances the configuration flexibility for the calibration process. It aligns with the changes in the calibration workflow that now supports dynamic model selection.src/workflows/airqo_etl_utils/constants.py (1)
261-261
: Class name update for consistency.The renaming from
CityModel
toCityModels
improves naming consistency, particularly aligning with the plural form used inCountryModels
. This change makes the API more intuitive and maintains a uniform naming convention.src/workflows/dags/airqo_mobile_measurements.py (1)
63-63
:❓ Verification inconclusive
Calibration grouping parameter update.
The change from grouping by "country" to grouping by "city" provides finer-grained calibration control. This aligns with the renamed
CityModels
class and the newCALIBRATEBY
configuration option, enabling more localized calibration models.
🏁 Script executed:
#!/bin/bash # Verify if there are any other usages of country-based calibration that might need updating rg "calibrate_data\(.*groupby=\"country\"" --type pyLength of output: 52
Calibration Grouping Parameter Change – Manual Verification Recommended
- The update from grouping by
"country"
to"city"
in the call toAirQoDataUtils.calibrate_data
aligns with the newCityModels
naming and theCALIBRATEBY
configuration.- A search for usages of
calibrate_data(…)
withgroupby="country"
returned no instances, suggesting that the transition has been applied consistently within the Python files. However, since the initial search output was empty, please perform a manual check or run additional searches to confirm there are no overlooked instances.Affected Code:
- File:
src/workflows/dags/airqo_mobile_measurements.py
- Line 63:
return AirQoDataUtils.calibrate_data(data=data, groupby="city")Please ensure downstream components expecting the old grouping do not break.
src/workflows/dags/airqo_measurements.py (2)
108-108
: Calibration grouping parameter update.Changed from grouping by "country" to grouping by "city" for more precise calibration in the historical hourly measurements workflow. This provides finer-grained control over the calibration models used.
419-419
: Calibration grouping parameter update.Changed from grouping by "country" to grouping by "city" for more precise calibration in the realtime measurements workflow. This ensures consistency with the historical hourly calibration approach.
src/workflows/airqo_etl_utils/utils.py (2)
18-18
: Adding CityModels import aligns with the new functionality.The addition of
CityModels
to the imports is consistent with the changes made to support dynamic model selection based on grouping criterion.
22-22
: Simplified import statement is more focused.Removing
Optional
from the typing import keeps the imports concise as it appears not to be used in this file.src/workflows/airqo_etl_utils/airqo_utils.py (10)
15-15
: Import name change from CityModel to CityModels.The import statement now includes
CityModels
instead ofCityModel
, which aligns with the class name change mentioned in the AI summary. This ensures consistency across the codebase.
626-626
: Direct attribute access improves code clarity.The change to directly access the hour attribute from the timestamp object (
data["timestamp"].dt.hour
) instead of using__getattribute__
is a good improvement. This makes the code more readable and follows pandas best practices.
640-643
: Well-structured dictionary for dynamic model selection.The new
calibrate_by
dictionary provides a clean way to map thegroupby
parameter to the appropriate model type. This makes the code more maintainable and aligns with the PR's goal of supporting dynamic model selection.
645-647
: Robust model selection with fallback to default.This change enables dynamic model selection based on the
groupby
parameter with a proper fallback toCountryModels
if the provided parameter doesn't match any known keys in the dictionary. This makes the code more resilient to unexpected inputs.
658-659
: Model path retrieval now uses the dynamic model approach.Updated to use the dynamically selected model (via the
model
variable) rather than hardcodingCountryModels
. This makes the code more flexible and consistent with the design changes.
663-664
: Consistent use of dynamic model selection.Similar to the previous change, this update ensures that the default lasso model path is retrieved using the dynamically selected model type, maintaining consistency throughout the function.
666-666
: Dynamic model enumeration based on selected model type.Retrieving available models from the dynamically selected model type rather than hardcoding to
CountryModels
enhances flexibility and aligns with the goal of supporting multiple grouping criteria.
668-668
: Variable name change enhances code readability.Changing the variable name from
country
togroupedby
better reflects the more general purpose of the variable, which can now represent either a country or a city based on the grouping criterion. This makes the code more self-documenting.
670-670
: Consistent handling of dynamic model selection.The condition for selecting a custom model now checks if the
groupedby
value exists in the available models for the dynamically selected model type, rather than hardcoding toCountryModels
. This ensures consistent behavior across different grouping criteria.
676-677
: Consistent use of lowercase for model path.The use of
groupedby.lower()
for model path retrieval ensures that the function handles variations in case correctly, improving robustness of the code.Also applies to: 683-684
Description
Just some cleanup
Related Issues
Summary by CodeRabbit