-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update/calculate hourly airqualitydata using bigqdata #4478
Update/calculate hourly airqualitydata using bigqdata #4478
Conversation
📝 WalkthroughWalkthroughThe changes enhance the flexibility and clarity of calibration and forecasting methods across multiple modules. The Changes
Sequence Diagram(s)sequenceDiagram
participant Caller
participant AirQoDataUtils
participant CountryModels
Caller->>AirQoDataUtils: Call calibrate_data(data, groupby="country")
AirQoDataUtils->>data: Group data using provided `groupby`
AirQoDataUtils->>CountryModels: Retrieve model list (using lowercase key)
AirQoDataUtils-->>Caller: Return calibrated DataFrame
Possibly related PRs
Suggested reviewers
Poem
Warning Review ran into problems🔥 ProblemsErrors were encountered while retrieving linked issues. Errors (1)
✨ Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (6)
src/workflows/airqo_etl_utils/utils.py (3)
11-18
: Remove unusedFrequency
import.The
Frequency
enum is imported but never used here, triggering a lint warning. Consider removing it from the import list to keep imports clean.🧰 Tools
🪛 Ruff (0.8.2)
16-16:
.constants.Frequency
imported but unusedRemove unused import:
.constants.Frequency
(F401)
21-21
: Eliminate the unusedOptional
.
Optional
is currently not referenced anywhere in this file. Removing it helps maintain a tidy import list and avoid future confusion.🧰 Tools
🪛 Ruff (0.8.2)
21-21:
typing.Optional
imported but unusedRemove unused import:
typing.Optional
(F401)
308-310
: Enhance docstring parameter explanation.Although it references a generic
city
orcountry
, it may be helpful to add examples indicating howcalibrateby
can be either aCountryModels
enum member or a simple string.src/workflows/airqo_etl_utils/airqo_utils.py (2)
10-17
: Remove unusedCityModel
import.
CityModel
import appears redundant now thatCountryModels
is used. Please remove it to address the lint warning.🧰 Tools
🪛 Ruff (0.8.2)
14-14:
.constants.CityModel
imported but unusedRemove unused import:
.constants.CityModel
(F401)
665-665
: Case-insensitive matching detail.While
.lower()
is spot on, consider.casefold()
for more robust locale handling if needed. This is a minor nuance.src/workflows/airqo_etl_utils/ml_utils.py (1)
335-353
: Clear 3D coordinate transformations.The docstring clarifies the need for radians, which is accurate. Consider adding a quick check or conversion for cases where lat/long might still be in degrees.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
src/workflows/airqo_etl_utils/airqo_utils.py
(5 hunks)src/workflows/airqo_etl_utils/constants.py
(1 hunks)src/workflows/airqo_etl_utils/ml_utils.py
(13 hunks)src/workflows/airqo_etl_utils/utils.py
(2 hunks)src/workflows/dags/airqo_measurements.py
(2 hunks)src/workflows/dags/airqo_mobile_measurements.py
(1 hunks)src/workflows/dags/forecast_prediction_jobs.py
(7 hunks)src/workflows/dags/weather_measurements.py
(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- src/workflows/dags/weather_measurements.py
🧰 Additional context used
🪛 Ruff (0.8.2)
src/workflows/airqo_etl_utils/airqo_utils.py
14-14: .constants.CityModel
imported but unused
Remove unused import: .constants.CityModel
(F401)
src/workflows/airqo_etl_utils/utils.py
16-16: .constants.Frequency
imported but unused
Remove unused import: .constants.Frequency
(F401)
21-21: typing.Optional
imported but unused
Remove unused import: typing.Optional
(F401)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: Analyze (python)
- GitHub Check: Analyze (javascript)
🔇 Additional comments (24)
src/workflows/airqo_etl_utils/constants.py (1)
281-300
: Well-structured addition of CountryModels enumThe CountryModels enum follows the consistent pattern established by other enums in this file, with proper string representation methods and clear documentation. This addition enables a more flexible geographic model selection approach.
src/workflows/dags/airqo_measurements.py (2)
107-108
: Parameter addition enhances calibration flexibilityAdding the
groupby="country"
parameter to the calibration method aligns with the PR objective to utilize country-specific models, allowing for more targeted data calibration processes.
419-419
: Consistent implementation of country-based groupingThe same
groupby="country"
parameter is appropriately applied to the realtime measurements calibration process, ensuring consistent behavior across different DAG workflows.src/workflows/dags/forecast_prediction_jobs.py (3)
13-13
: Good addition of Frequency importAdding this import prepares the file for the string-to-enum conversion below.
43-45
: Improved code maintainability with enum usageReplacing string literals with the Frequency enum values enhances code quality by centralizing frequency definitions, preventing typos, and simplifying future changes to frequency handling.
49-49
: Systematic replacement of string literals with enum constantsThis comprehensive replacement of hardcoded frequency strings with centralized enum constants across all relevant functions demonstrates excellent attention to consistency and maintainability.
Also applies to: 53-53, 57-57, 66-70, 79-79, 97-97, 101-101, 105-105, 109-109, 117-122, 131-131
src/workflows/dags/airqo_mobile_measurements.py (1)
63-63
: Enhanced calibration with country-based groupingAdding the
groupby="country"
parameter to mobile measurements calibration ensures consistency with the approach used in other workflows and enables country-specific model selection.src/workflows/airqo_etl_utils/utils.py (1)
326-328
: Approved usage ofCountryModels
.This check and distinct return logic nicely accommodate both enum-based and string-based calibration targets. The code is concise and clear.
src/workflows/airqo_etl_utils/airqo_utils.py (10)
574-574
: Parameter extension for flexible calibration.Introducing
groupby
enhances reusability, allowing calibrations by various location types. This aligns with the broader country-based model logic.
603-603
: Potential KeyError for unexpected group fields.Accessing
sites[["site_id", groupby]]
could fail ifgroupby
is not set correctly. Consider validating or providing a fallback.
645-645
: Consistent grouping by dynamic field.Switching to dynamic grouping fosters scalable logic for different calibration levels (e.g., city, country). Looks good.
650-650
: Robust default PM2.5 model fallback.Relying on
CountryModels.DEFAULT
ensures continuity when a custom model is unavailable. This increases reliability.
657-657
: Logical default PM10 model usage.Same structured fallback for PM10 calibration prevents potential model-not-found errors.
661-661
: Maintaining consistent model keys.Gathering
CountryModels
values into a list enables uniform checking of available calibration keys. This is clean and future-proof.
663-663
: Clear iteration over grouped DataFrame.Iterating
(country, group)
fromgrouped_df
is straightforward and maintainable. Appropriately splits calibration tasks by location.
671-671
: Prioritizing custom PM2.5 model.Fetching the PM2.5 model by explicit country name further personalizes the calibration flow.
678-678
: Custom PM10 model consistency.Same approach to PM10 ensures uniform calibration logic across pollutants.
893-893
: Country-based grouping in action.Passing
"country"
for thegroupby
parameter ensures the calibration logic aligns with the newly introduced approach.src/workflows/airqo_etl_utils/ml_utils.py (6)
99-123
: Improved docstring clarity inpreprocess_data
.Defining the mandatory columns and explaining the daily vs. hourly logic is helpful. You might also clarify how to handle large missing data segments beyond simple interpolation.
162-190
: Comprehensive lag & roll documentation.The function thoroughly covers both daily and hourly scenarios, including rolling stats. This structured approach provides convenient feature engineering. Keep watching memory impact when dealing with large DataFrames.
241-294
: Robust time-features extraction.Allowing both daily and hourly branches with
freq.str
is tidy. The added error handling for invalid frequencies is good practice.
296-333
: Cyclic features for daily/hourly data.Capturing cyclical seasonality with sine/cosine transformations is a solid approach. Ensure these transformations align with your domain boundaries (e.g., year, month).
703-703
: Model retrieval withfrequency.str
.Using
f"{frequency.str}_forecast_model.pkl"
ensures consistent naming logic. Good synergy with the GCS reading pattern.
757-759
: Frequency-based forecast collection.The branching on
frequency.str
is straightforward, ensuring the correct MongoDB stores for either daily or hourly forecasts.
Description
Related Issues
Summary by CodeRabbit
New Features
Refactor
Documentation