Generate and Convert Jaffle Shop CSVs to Parquet Format for GCS#57
Generate and Convert Jaffle Shop CSVs to Parquet Format for GCS#57
Conversation
…rmat and instructions to upload to GCS bucket
There was a problem hiding this comment.
Pull request overview
This pull request establishes the initial infrastructure for Jaffle Shop data integration, introducing a Python script to convert CSV files to Parquet format, comprehensive project configuration, and step-by-step documentation. The changes prepare the environment for efficient data processing and integration with Snowflake and GCP.
Key Changes:
- Added CSV-to-Parquet conversion script with automated processing for seven Jaffle Shop data tables
- Configured project dependencies and development tooling via pyproject.toml with Python 3.12+ support
- Documented complete workflow from CSV generation through GCP upload
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
integration/pyproject.toml |
Establishes project metadata, dependencies (pandas, pyarrow, Snowflake connectors), and development tool configurations for the integration package |
integration/jaffle-shop-data/convert_jaffle_csv_to_parquet.py |
Implements automated CSV-to-Parquet conversion for seven Jaffle Shop datasets with basic error handling |
integration/jaffle-shop-data/GENERATE_JAFFLE_SHOP_PARQUET.md |
Provides user documentation covering prerequisites, CSV generation, conversion steps, and GCP upload instructions |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ) | ||
|
|
||
| JAFFLE_PARQUET_DATA_PATH = JAFFLE_CSV_DATA_PATH / "parquet" | ||
| Path.mkdir(JAFFLE_PARQUET_DATA_PATH, exist_ok=True) |
There was a problem hiding this comment.
The call to Path.mkdir() is incorrect. The method should be called on the path instance, not on the Path class. Change Path.mkdir(JAFFLE_PARQUET_DATA_PATH, exist_ok=True) to JAFFLE_PARQUET_DATA_PATH.mkdir(exist_ok=True) or JAFFLE_PARQUET_DATA_PATH.mkdir(parents=True, exist_ok=True) to ensure parent directories are also created if needed.
| Path.mkdir(JAFFLE_PARQUET_DATA_PATH, exist_ok=True) | |
| JAFFLE_PARQUET_DATA_PATH.mkdir(parents=True, exist_ok=True) |
This pull request introduces the initial setup for the Jaffle Shop data integration, focusing on enabling CSV-to-Parquet conversion and preparing the environment for efficient data processing and integration with Snowflake and GCP. Key changes include a new conversion script, a comprehensive project configuration, and usage documentation.
Data conversion and workflow setup
convert_jaffle_csv_to_parquet.pyscript to automate conversion of Jaffle Shop CSV files into Parquet format, improving data storage and query efficiency for downstream use in Snowflake.GENERATE_JAFFLE_SHOP_PARQUET.mddocumentation to guide users through generating CSV data, converting it to Parquet, and uploading Parquet files to GCP, including prerequisites and step-by-step instructions.Project configuration and dependencies
pyproject.tomlfor project metadata, dependency management (including pandas, pyarrow, fastparquet, Snowflake connectors), development tools, and configuration for code quality and testing.