Postgres ETL Project

Overview

This project implements an ETL (Extract, Transform, Load) pipeline using Python and PostgreSQL for a music streaming app, Sparkify. It processes song and user activity data, transforming it into a set of dimensional tables for easier querying and analysis.

Project Structure

etl.py: Contains the ETL processes to read JSON logs on user activity and JSON metadata on songs, and load the data into PostgreSQL tables.
create_tables.py: Drops and creates the database and tables.
sql_queries.py: Contains all the SQL queries used in the ETL process.

Database Schema

The database uses a star schema optimized for queries on song play analysis. This includes the following tables:

Fact Table

songplays - records in log data associated with song plays

Dimension Tables

users - users in the app
songs - songs in music database
artists - artists in music database
time - timestamps of records in songplays broken down into specific units

Requirements

Python 3.x
PostgreSQL
psycopg2
pandas

Dataset

The project uses two datasets:

Song Dataset: JSON files containing metadata about songs and artists.
Log Dataset: JSON files containing user activity logs from the music streaming app.

ETL Pipeline

Process song data to populate the songs and artists tables.
Process log data to populate the time and users tables.
Use data from both song and log datasets to populate the songplays fact table.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
data		data
README.md		README.md
create_tables.py		create_tables.py
etl.ipynb		etl.ipynb
etl.py		etl.py
schema.PNG		schema.PNG
sql_queries.py		sql_queries.py
test.ipynb		test.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Postgres ETL Project

Overview

Project Structure

Database Schema

Fact Table

Dimension Tables

Requirements

Dataset

ETL Pipeline

About

Releases

Packages

Languages

KouhouMed/Postgres-ETL

Folders and files

Latest commit

History

Repository files navigation

Postgres ETL Project

Overview

Project Structure

Database Schema

Fact Table

Dimension Tables

Requirements

Dataset

ETL Pipeline

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages