Skip to content

Add Ingestion Module for GCS/S3 Resources to Snowflake #54

@cyclux

Description

@cyclux

Load jaffle-shop Parquet files from cloud storage into Snowflake.

API:

from data import load_from_gcs, load_from_s3

results = load_from_gcs(session, schema_name="RAW")
results = load_from_s3(session, bucket="s3://bucket/path/", schema_name="RAW")

Files:

  • data/ingestion.py
  • data/sql/ingestion/*.sql

Behavior:

  1. GCS: Download → internal stage → COPY INTO
  2. S3: External stage → COPY INTO
  3. Schema inferred from Parquet (INFER_SCHEMA)
  4. Idempotent (safe to re-run)

SQL Templates:

File Purpose
create_parquet_file_format.sql Parquet format definition
create_internal_stage.sql Internal stage for GCS downloads
create_stage_s3_public.sql External S3 stage
create_table_from_parquet.sql Table creation with INFER_SCHEMA
copy_into_table.sql COPY INTO with MATCH_BY_COLUMN_NAME

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions