This project showcases my ability to design, implement, and rigorously test sophisticated analytical data models using dbt (Data Build Tool) Core. I engineered a solution to transform the classicmodels relational dataset, hosted on PostgreSQL in AWS RDS, into two distinct, analytics-ready structures: a best-practice Star Schema and a denormalized One Big Table (OBT). My goal was to master dbt's powerful transformation capabilities and apply data modeling principles in a cloud environment.
- dbt Core Mastery & End-to-End Workflow:
- Successfully configured and managed a dbt Core project from initial setup and source connection (AWS RDS PostgreSQL) through to model development, materialization, and comprehensive testing.
- Demonstrated a full dbt lifecycle including
dbt runfor model execution,dbt testfor data validation, anddbt depsfor package management.
- Advanced Data Modeling Implementation:
- Star Schema Design: Architected and built a robust Star Schema, including:
fact_orderstable at the order line item granularity.- Dimension tables (
dim_customers,dim_employees,dim_offices,dim_products,dim_dates). - Systematically generated surrogate keys using the
dbt_utils.generate_surrogate_keymacro, ensuring stable dimensional relationships. - Created a comprehensive
dim_datestable using thedbt-datepackage for flexible time-based analysis.
- One Big Table (OBT) Construction: Developed a denormalized
orders_obtby strategically pre-joining multiple source tables, optimized for specific analytical use cases.
- Star Schema Design: Architected and built a robust Star Schema, including:
- Sophisticated Transformation Logic with SQL & Jinja:
- Authored modular and maintainable dbt models using advanced SQL combined with Jinja templating for dynamic queries, reusable logic (e.g., macros,
ref,sourcefunctions), and efficient development.
- Authored modular and maintainable dbt models using advanced SQL combined with Jinja templating for dynamic queries, reusable logic (e.g., macros,
- Rigorous Data Quality Assurance:
- Implemented a comprehensive testing strategy by defining data quality tests in
schema.ymlfiles. - Leveraged built-in dbt tests (
unique,not_null,accepted_values,relationships) and dbt-utils macros (e.g.,dbt_utils.unique_combination_of_columns) to validate primary keys, foreign key relationships, column constraints, and business rules.
- Implemented a comprehensive testing strategy by defining data quality tests in
- In-Database Transformation Orchestration:
- Effectively utilized dbt to orchestrate all data transformations directly within the AWS RDS PostgreSQL database, leveraging its processing power and ensuring data remained within a secure cloud environment.
- Cloud Data Warehousing on AWS:
- Gained practical experience working with PostgreSQL on AWS RDS as a target data warehouse for dbt transformations.
- Analysis of Modeling Trade-offs:
- Developed a practical understanding of the structural differences, benefits, and drawbacks between Star Schema and OBT modeling approaches, enabling informed decisions for future projects.
I designed this dimensional model for optimal business intelligence and reporting performance. It features:
- Central Fact Table:
fact_orders(capturing order line item details). - Conforming Dimensions:
dim_customers,dim_employees,dim_offices,dim_products, anddim_dates. - Key Techniques: Employed dbt's
reffunction for inter-model dependencies,dbt_utils.generate_surrogate_keyfor robust primary keys in dimensions, anddbt-date.get_date_dimensionfor a comprehensive date dimension.
To cater to specific analytical queries that benefit from pre-joined data, I constructed the orders_obt.
- Denormalized Structure: This table combines data from
orders,orderdetails,products,productlines,customers, andemployees. - Purpose: Designed to simplify query writing for certain use cases and potentially improve performance by reducing complex joins at query time.
Ensuring data reliability was paramount. I leveraged dbt's testing framework extensively:
- Schema Tests (
schema.yml): Defined tests directly alongside model schema definitions.- Uniqueness: Validated primary keys in all dimension and fact tables (e.g.,
customer_skindim_customers). - Non-Null Constraints: Ensured critical attributes (e.g., foreign keys in fact tables, key business identifiers) were always populated.
- Referential Integrity: Used
relationshipstests to confirm that foreign keys in fact tables correctly referenced primary keys in dimension tables. - Custom Logic (via
dbt_utils): Employeddbt_utils.unique_combination_of_columnsfor composite key validation in the OBT.
- Uniqueness: Validated primary keys in all dimension and fact tables (e.g.,
- Execution: All tests were executed via
dbt test, providing immediate feedback on data integrity after model runs.
- Data Transformation & Modeling: dbt Core
- Programming Languages & Templating: SQL, Jinja, YAML (for dbt configurations)
- Database: PostgreSQL (hosted on AWS RDS)
- Cloud Platform: Amazon Web Services (AWS)
- dbt Packages:
dbt-utils,dbt-date - Version Control: Git & GitHub
- Development Environment: Standard Command Line (for dbt execution)