Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update/calculate hourly airqualitydata using bigqdata #4534

Conversation

NicholasTurner23
Copy link
Contributor

@NicholasTurner23 NicholasTurner23 commented Mar 7, 2025

Description

Just some clean up

Summary by CodeRabbit

  • New Features

    • Enhanced configuration now supports additional air quality data sources for more flexible metrics mapping.
  • Bug Fixes

    • Improved data retrieval by ensuring consistent outputs even when no data is available.
    • Updated query filters to better capture complete and reliable information.
  • Refactor

    • Optimized internal processing with dynamic parameter handling and robust location data parsing for increased system reliability.
  • Documentation

    • Expanded method descriptions to provide clearer guidance on expected inputs and outputs.

Copy link
Contributor

coderabbitai bot commented Mar 7, 2025

📝 Walkthrough

Walkthrough

This pull request implements enhancements across multiple ETL utility modules. It removes deprecated methods, updates method signatures with explicit type hints and Optional parameters, and improves internal logic with configurable mappings, refined date formatting, enhanced coordinate parsing, and revised SQL filtering conditions. Docstrings have been expanded to clearly define method behavior, and configuration constants have been updated to support multiple data sources.

Changes

File(s) Change Summary
src/workflows/airqo_etl_utils/airnow_utils.py, src/workflows/airqo_etl_utils/airqo_utils.py, src/workflows/airqo_etl_utils/bigquery_api.py, src/workflows/airqo_etl_utils/weather_data_utils.py, src/workflows/airqo_etl_utils/data_summary_utils.py, src/workflows/airqo_etl_utils/datautils.py Updated method signatures with Optional type hints and refined docstrings; removed deprecated parameter_column_name; updated API key usage; replaced hardcoded parameter mappings with configuration mapping; improved date formatting, coordinate parsing, and SQL query conditions.
src/workflows/airqo_etl_utils/config.py Added new constant AIRBEAM_BAM_FIELD_MAPPING and updated device_config_mapping to support additional data source mappings for BAM devices.
src/workflows/dags/data_summary.py Added a comment noting that the data_summary() function is not currently used and will be removed once the analytics dashboard is implemented.

Sequence Diagram(s)

sequenceDiagram
    participant Caller as ETL Job
    participant AirnowUtils as AirnowDataUtils
    participant Config as Config Mapping

    Caller->>AirnowUtils: query_bam_data(date, parameters)
    AirnowUtils->>AirnowUtils: Format date using variable date_format
    AirnowUtils->>Config: Retrieve parameter mapping based on device category
    Config-->>AirnowUtils: Return mapping details
    AirnowUtils->>Caller: Return DataFrame (or empty DataFrame if no data)
Loading

Possibly related PRs

Suggested reviewers

  • Baalmart

Poem

In our code realm, changes now gleam,
Configs and types refine the dream.
Mappings and queries elegantly flow,
Docstrings and constants put on a show.
Cheers to our updates—steady and keen!
🎉 Happy coding, where every byte sings!

Warning

Review ran into problems

🔥 Problems

Errors were encountered while retrieving linked issues.

Errors (1)
  • JIRA integration encountered authorization issues. Please disconnect and reconnect the integration in the CodeRabbit UI.

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8e0a0ee and 146be9c.

📒 Files selected for processing (8)
  • src/workflows/airqo_etl_utils/airnow_utils.py (4 hunks)
  • src/workflows/airqo_etl_utils/airqo_utils.py (3 hunks)
  • src/workflows/airqo_etl_utils/bigquery_api.py (2 hunks)
  • src/workflows/airqo_etl_utils/config.py (2 hunks)
  • src/workflows/airqo_etl_utils/data_summary_utils.py (1 hunks)
  • src/workflows/airqo_etl_utils/datautils.py (3 hunks)
  • src/workflows/airqo_etl_utils/weather_data_utils.py (1 hunks)
  • src/workflows/dags/data_summary.py (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: Analyze (python)
  • GitHub Check: Analyze (javascript)
🔇 Additional comments (20)
src/workflows/dags/data_summary.py (1)

50-50: Good documentation for future reference

The TODO comment clearly indicates that this module is scheduled for removal once the data health analytics dashboard is implemented, providing valuable context for future developers.

src/workflows/airqo_etl_utils/config.py (2)

308-308: Good addition of air quality parameter mapping

The new AIRBEAM_BAM_FIELD_MAPPING constant correctly maps AirBeam metrics to standardized parameter names, which improves data consistency across different data sources.


367-371: Well-structured configuration update for multi-source support

The device configuration mapping has been effectively expanded to support both AirQo and AirBeam data sources within the BAM device type. This enhancement improves the system's flexibility for handling multiple data sources with a consistent interface.

src/workflows/airqo_etl_utils/data_summary_utils.py (1)

7-32: Excellent docstring addition with comprehensive details

The added docstring significantly improves code readability and maintainability by clearly documenting:

  • The function's purpose and behavior
  • Expected input data structure with required columns
  • Detailed description of return values and their format

This documentation will be valuable for future developers working with this method.

src/workflows/airqo_etl_utils/weather_data_utils.py (1)

181-183: Improved type safety with explicit return type

The addition of an explicit return type annotation (-> List[Dict[str, Any]]) enhances code readability and type safety. This change aligns with modern Python best practices and helps IDE tools provide better code completion and error detection.

src/workflows/airqo_etl_utils/bigquery_api.py (2)

1050-1055: Improved method signature with better type hinting and default value

The update to the generate_missing_data_query method enhances both usability and type safety by making the network parameter optional with a sensible default value. This change aligns with modern Python practices by leveraging the Optional type hint.


1077-1081: Enhanced query precision with stricter data validation

The SQL query has been improved in two significant ways:

  1. Using TIMESTAMP_TRUNC(timestamp, DAY) instead of the previous DATE(timestamp) provides more precise timestamp handling with proper timezone consideration
  2. Adding explicit checks for non-null sensor values (s1_pm2_5, s2_pm2_5, etc.) ensures higher data quality for analysis

These changes will result in more reliable data extraction and better handling of edge cases.

src/workflows/airqo_etl_utils/datautils.py (4)

5-5: Added ast module for secure string evaluation

The addition of the ast module enables safer parsing of string representations of Python literals.


1039-1050: Improved docstring with clearer parameter description

The docstring now provides a more precise description of the expected input format for the coordinates parameter and details about the return value structure.


1051-1059: Enhanced coordinate parsing with safer evaluation approach

The implementation has been significantly improved by using ast.literal_eval() instead of manual string manipulation. This approach:

  1. Provides better security by safely evaluating string literals without the risks of eval()
  2. Handles a wider variety of input formats
  3. Includes explicit error handling for both ValueError and SyntaxError

This is a more robust and maintainable solution for coordinate parsing.


1104-1104: Updated method call to use the new implementation

The method call has been correctly updated to use the new implementation.

src/workflows/airqo_etl_utils/airqo_utils.py (3)

5-5: Added import for Optional type hint

The addition of Optional from the typing module enables more precise type annotations.


741-744: Enhanced method signature with explicit type hints

The method signature for extract_devices_with_uncalibrated_data has been improved with explicit type annotations:

  • start_date is now explicitly typed as str
  • table and network parameters are marked as Optional, making their potential absence clear
  • Default values remain unchanged, preserving backward compatibility

These type hints improve code readability and enable better static analysis and IDE support.


794-795: Simplified timestamp handling

The timestamp handling has been simplified to directly use row.timestamp instead of a more complex approach, making the code more straightforward and less error-prone.

src/workflows/airqo_etl_utils/airnow_utils.py (6)

10-10: Updated import with clear aliasing

The import has been improved by aliasing configuration as Config, which enhances readability and consistency.


21-38: Added comprehensive docstring for API method

A detailed docstring has been added to the query_bam_data method, which:

  1. Clearly explains the method's purpose
  2. Provides parameter descriptions with expected formats
  3. Documents the return value
  4. Includes a usage example

This significantly improves code documentation and makes the API more accessible to new developers.


40-44: Improved date formatting with variable

The date format string has been extracted to a variable date_format instead of being hardcoded in multiple places. This improves maintainability by centralizing the format definition.


53-53: Enhanced return statement with consistent type

The return statement now ensures that a DataFrame is always returned, even when no data is available. This makes the function more predictable and easier to use by eliminating the need to check for None returns.


85-85: Updated configuration reference

The API key reference has been updated to use the renamed Config.US_EMBASSY_API_KEY for consistency.


148-152: Refactored parameter mapping to use configuration-based approach

The code has been refactored to use a configuration-based approach for parameter mapping instead of a method-based approach. This:

  1. Makes the code more declarative and easier to maintain
  2. Centralizes parameter configurations in one place
  3. Enables more flexible configuration options

This aligns with modern software engineering practices by favoring configuration over hard-coded logic.

✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@Baalmart Baalmart merged commit 2116d62 into airqo-platform:staging Mar 7, 2025
46 checks passed
@Baalmart Baalmart mentioned this pull request Mar 7, 2025
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants