Skip to content

Commit b05396c

Browse files
authored
Merge branch 'current' into test-vale
2 parents 08d82db + 3e2251e commit b05396c

File tree

114 files changed

+819
-289
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

114 files changed

+819
-289
lines changed

website/blog/2021-02-05-dbt-project-checklist.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -173,8 +173,8 @@ This post is the checklist I created to guide our internal work, and I’m shari
173173

174174
Useful Links
175175

176-
* [FAQs for documentation](/docs/collaborate/documentation#faqs)
177-
* [Doc blocks](/docs/collaborate/documentation#using-docs-blocks)
176+
* [FAQs for documentation](/docs/build/documentation#faqs)
177+
* [Doc blocks](/docs/build/documentation#using-docs-blocks)
178178

179179
## ✅ dbt Cloud specifics
180180
----------------------------------------------------------------------------------------------------------------------------------------------------------

website/blog/2021-12-05-how-to-build-a-mature-dbt-project-from-scratch.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -87,7 +87,7 @@ The most important thing we’re introducing when your project is an infant is t
8787

8888
* Introduce modularity with [{{ ref() }}](/reference/dbt-jinja-functions/ref) and [{{ source() }}](/reference/dbt-jinja-functions/source)
8989

90-
* [Document](/docs/collaborate/documentation) and [test](/docs/build/data-tests) your first models
90+
* [Document](/docs/build/documentation) and [test](/docs/build/data-tests) your first models
9191

9292
![image alt text](/img/blog/building-a-mature-dbt-project-from-scratch/image_3.png)
9393

website/blog/2022-09-28-analyst-to-ae.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -133,7 +133,7 @@ It’s much easier to keep to a naming guide when the writer has a deep understa
133133
134134
If we want to know how certain logic was built technically, then we can reference the SQL code in dbt docs. If we want to know *why* a certain logic was built into that specific model, then that’s where we’d turn to the documentation.
135135

136-
- Example of not-so-helpful documentation ([dbt docs can](https://docs.getdbt.com/docs/collaborate/documentation) build this dynamically):
136+
- Example of not-so-helpful documentation ([dbt docs can](https://docs.getdbt.com/docs/build/documentation) build this dynamically):
137137
- `Case when Zone = 1 and Level like 'A%' then 'True' else 'False' end as GroupB`
138138
- Example of better, more descriptive documentation (add to your dbt markdown file or column descriptions):
139139
- Group B is defined as Users in Zone 1 with a Level beginning with the letter 'A'. These users are accessing our new add-on product that began in Beta in August 2022. It's recommended to filter them out of the main Active Users metric.

website/blog/2023-02-14-passing-the-dbt-certification-exam.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ In this article, two Montreal Analytics consultants, Jade and Callie, discuss th
2525

2626
**J:** To prepare for the exam, I built up a practice dbt project. All consultants do this as part of Montreal Analytics onboarding process, and this project allowed me to practice implementing sources and tests, refactoring SQL models, and debugging plenty of error messages. Additionally, I reviewed the [Certification Study Guide](https://www.getdbt.com/assets/uploads/dbt_certificate_study_guide.pdf) and attended group learning sessions.
2727

28-
**C:** To prepare for the exam I reviewed the official dbt Certification Study Guide and the [official dbt docs](https://docs.getdbt.com/), and attended group study and learning sessions that were hosted by Montreal Analytics for all employees interested in taking the exam. As a group, we prioritized subjects that we felt less familiar with; for the first cohort of test takers this was mainly newer topics that haven’t yet become integral to a typical dbt project, such as [doc blocks](https://docs.getdbt.com/docs/collaborate/documentation#using-docs-blocks) and [configurations versus properties](https://docs.getdbt.com/reference/configs-and-properties). These sessions mainly covered the highlights and common “gotchas” that are experienced using these techniques. The sessions were moderated by a team member who had already successfully completed the dbt Certification, but operated in a very collaborative environment, so everyone could provide additional information, ask questions to the group, and provide feedback to other members of our certification taking group.
28+
**C:** To prepare for the exam I reviewed the official dbt Certification Study Guide and the [official dbt docs](https://docs.getdbt.com/), and attended group study and learning sessions that were hosted by Montreal Analytics for all employees interested in taking the exam. As a group, we prioritized subjects that we felt less familiar with; for the first cohort of test takers this was mainly newer topics that haven’t yet become integral to a typical dbt project, such as [doc blocks](https://docs.getdbt.com/docs/build/documentation#using-docs-blocks) and [configurations versus properties](https://docs.getdbt.com/reference/configs-and-properties). These sessions mainly covered the highlights and common “gotchas” that are experienced using these techniques. The sessions were moderated by a team member who had already successfully completed the dbt Certification, but operated in a very collaborative environment, so everyone could provide additional information, ask questions to the group, and provide feedback to other members of our certification taking group.
2929

3030
I felt comfortable with the breadth of my dbt knowledge and had familiarity with most topics. However in my day-to-day implementation, I am often reliant on documentation or copying and pasting specific configurations in order to get the correct settings. Therefore, my focus was on memorizing important criteria for *how to use* certain features, particularly on the order/nesting of how the key YAML files are configured (dbt_project.yml, table.yml, source.yml).
3131

@@ -75,4 +75,4 @@ Now, the first thing you must do when you’ve passed a test is to get yourself
7575
Standards and best practices are very important, but a test is a measure at a single point in time of a rapidly evolving industry. It’s also a measure of my test-taking abilities, my stress levels, and other things unrelated to my skill in data modeling; I wouldn’t be a good analyst if I didn’t recognize the faults of a measurement. I’m glad to have this check mark completed, but I will continue to stay up to date with changes, learn new data skills and techniques, and find ways to continue being a holistically helpful teammate to my colleagues and clients.
7676

7777

78-
You can learn more about the dbt Certification [here](https://www.getdbt.com/blog/dbt-certification-program/).
78+
You can learn more about the dbt Certification [here](https://www.getdbt.com/blog/dbt-certification-program/).

website/blog/2023-05-04-generating-dynamic-docs.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -215,7 +215,7 @@ Which in turn can be copy-pasted into a new `.yml` file. In our example, we writ
215215

216216
## Create docs blocks for the new columns
217217

218-
[Docs blocks](https://docs.getdbt.com/docs/collaborate/documentation#using-docs-blocks) can be utilized to write more DRY and robust documentation. To use docs blocks, update your folder structure to contain a `.md` file. Your file structure should now look like this:
218+
[Docs blocks](https://docs.getdbt.com/docs/build/documentation#using-docs-blocks) can be utilized to write more DRY and robust documentation. To use docs blocks, update your folder structure to contain a `.md` file. Your file structure should now look like this:
219219

220220
```
221221
models/core/activity_based_interest
Lines changed: 119 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,119 @@
1+
---
2+
title: Putting Your DAG on the internet
3+
description: "Use dbt and Snowflake's external access integrations to allow Snowflake Python models access the internet."
4+
slug: dag-on-the-internet
5+
6+
authors: [ernesto_ongaro, sebastian_stan, filip_byrén]
7+
8+
tags: [analytics craft, APIs, data ecosystem]
9+
hide_table_of_contents: false
10+
11+
date: 2024-06-14
12+
is_featured: true
13+
---
14+
15+
**New in dbt: allow Snowflake Python models to access the internet**
16+
17+
With dbt 1.8, dbt released support for Snowflake’s [external access integrations](https://docs.snowflake.com/en/developer-guide/external-network-access/external-network-access-overview) further enabling the use of dbt + AI to enrich your data. This allows querying of external APIs within dbt Python models, a functionality that was required for dbt Cloud customer, [EQT AB](https://eqtgroup.com/). Learn about why they needed it and how they helped build the feature and get it shipped!
18+
19+
<!--truncate-->
20+
## Why did EQT require this functionality?
21+
by Filip Bryén, VP and Software Architect (EQT) and Sebastian Stan, Data Engineer (EQT)
22+
23+
_EQT AB is a global investment organization and as a long-term customer of dbt Cloud, presented at dbt’s Coalesce [2020](https://www.getdbt.com/coalesce-2020/seven-use-cases-for-dbt) and [2023](https://www.youtube.com/watch?v=-9hIUziITtU)._
24+
25+
_Motherbrain Labs is EQT’s bespoke AI team, primarily focused on accelerating our portfolio companies' roadmaps through hands-on data and AI work. Due to the high demand for our time, we are constantly exploring mechanisms for simplifying our processes and increasing our own throughput. Integration of workflow components directly in dbt has been a major efficiency gain and helped us rapidly deliver across a global portfolio._
26+
27+
Motherbrain Labs is focused on creating measurable AI impact in our portfolio. We work hand-in-hand with leadership from our deal teams and portfolio company leadership but our starting approach is always the same: identify which data matters.
28+
29+
While we have access to reams of proprietary information, we believe the greatest effect happens when we combine that information with external datasets like geolocation, demographics, or competitor traction.
30+
31+
These valuable datasets often come from third-party vendors who operate on a pay-per-use model; a single charge for every piece of information we want. To avoid overspending, we focus on enriching only the specific subset of data that is relevant to an individual company's strategic question.
32+
33+
In response to this recurring need, we have partnered with Snowflake and dbt to introduce new functionality that facilitates communication with external endpoints and manages secrets within dbt. This new integration enables us to incorporate enrichment processes directly into our DAGs, similar to how current Python models are utilized within dbt environments. We’ve found that this augmented approach allows us to reduce complexity and enable external communications before materialization.
34+
35+
## An example with Carbon Intensity: How does it work?
36+
37+
In this section, we will demonstrate how to integrate an external API to retrieve the current Carbon Intensity of the UK power grid. The goal is to illustrate how the feature works, and perhaps explore how the scheduling of data transformations at different times can potentially reduce their carbon footprint, making them a greener choice. We will be leveraging the API from the [UK National Grid ESO](https://www.nationalgrideso.com/) to achieve this.
38+
39+
To start, we need to set up a network rule (Snowflake instructions [here](https://docs.snowflake.com/en/user-guide/network-rules)) to allow access to the external API. Specifically, we'll create an egress rule to permit Snowflake to communicate with api.carbonintensity.org.
40+
41+
Next, to access network locations outside of Snowflake, you need to define an external access integration first and reference it within a dbt Python model. You can find an overview of Snowflake's external network access [here](https://docs.snowflake.com/en/developer-guide/external-network-access/external-network-access-overview).
42+
43+
This API is open and if it requires an API key, handle it similarly to managing secrets. More information on API authentication in Snowflake is available [here](https://docs.snowflake.com/en/user-guide/api-authentication).
44+
45+
For simplicity’s sake, we will show how to create them using [pre-hooks](/reference/resource-configs/pre-hook-post-hook) in a model configuration yml file:
46+
47+
48+
```
49+
models:
50+
- name: external_access_sample
51+
config:
52+
pre_hook:
53+
- "create or replace network rule test_network_rule type = host_port mode = egress value_list= ('api.carbonintensity.org.uk:443');"
54+
- "create or replace external access integration test_external_access_integration allowed_network_rules = (test_network_rule) enabled = true;"
55+
```
56+
57+
Then we can simply use the new external_access_integrations configuration parameter to use our network rule within a Python model (called external_access_sample.py):
58+
59+
60+
```
61+
import snowflake.snowpark as snowpark
62+
def model(dbt, session: snowpark.Session):
63+
dbt.config(
64+
materialized="table",
65+
external_access_integrations=["test_external_access_integration"],
66+
packages=["httpx==0.26.0"]
67+
)
68+
import httpx
69+
return session.create_dataframe(
70+
[{"carbon_intensity": httpx.get(url="https://api.carbonintensity.org.uk/intensity").text}]
71+
)
72+
```
73+
74+
75+
The result is a model with some json I can parse, for example, in a SQL model to extract some information:
76+
77+
78+
```
79+
{{
80+
config(
81+
materialized='incremental',
82+
unique_key='dbt_invocation_id'
83+
)
84+
}}
85+
86+
with raw as (
87+
select parse_json(carbon_intensity) as carbon_intensity_json
88+
from {{ ref('external_access_demo') }}
89+
)
90+
91+
select
92+
'{{ invocation_id }}' as dbt_invocation_id,
93+
value:from::TIMESTAMP_NTZ as start_time,
94+
value:to::TIMESTAMP_NTZ as end_time,
95+
value:intensity.actual::NUMBER as actual_intensity,
96+
value:intensity.forecast::NUMBER as forecast_intensity,
97+
value:intensity.index::STRING as intensity_index
98+
from raw,
99+
lateral flatten(input => raw.carbon_intensity_json:data)
100+
```
101+
102+
103+
The result is a model that will keep track of dbt invocations, and the current UK carbon intensity levels.
104+
105+
<Lightbox src="/img/blog/2024-06-12-putting-your-dag-on-the-internet/image1.png" title="Preview in dbt Cloud IDE of output" />
106+
107+
## dbt best practices
108+
109+
This is a very new area to Snowflake and dbt -- something special about SQL and dbt is that it’s very resistant to external entropy. The second we rely on API calls, Python packages and other external dependencies, we open up to a lot more external entropy. APIs will change, break, and your models could fail.
110+
111+
Traditionally dbt is the T in ELT (dbt overview [here](https://docs.getdbt.com/terms/elt)), and this functionality unlocks brand new EL capabilities for which best practices do not yet exist. What’s clear is that EL workloads should be separated from T workloads, perhaps in a different modeling layer. Note also that unless using incremental models, your historical data can easily be deleted. dbt has seen a lot of use cases for this, including this AI example as outlined in this external [engineering blog post](https://klimmy.hashnode.dev/enhancing-your-dbt-project-with-large-language-models).
112+
113+
**A few words about the power of Commercial Open Source Software**
114+
115+
In order to get this functionality shipped quickly, EQT opened a pull request, Snowflake helped with some problems we had with CI and a member of dbt Labs helped write the tests and merge the code in!
116+
117+
dbt now features this functionality in dbt 1.8+ or on “Keep on latest version” option of dbt Cloud (dbt overview [here](/docs/dbt-versions/upgrade-dbt-version-in-cloud#keep-on-latest-version)).
118+
119+
dbt Labs staff and community members would love to chat more about it in the [#db-snowflake](https://getdbt.slack.com/archives/CJN7XRF1B) slack channel.

website/blog/authors.yml

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -614,3 +614,30 @@ anders_swanson:
614614
links:
615615
- icon: fa-linkedin
616616
url: https://www.linkedin.com/in/andersswanson
617+
618+
ernesto_ongaro:
619+
image_url: /img/blog/authors/ernesto-ongaro.png
620+
job_title: Senior Solutions Architect
621+
name: Ernesto Ongaro
622+
organization: dbt Labs
623+
links:
624+
- icon: fa-linkedin
625+
url: https://www.linkedin.com/in/eongaro
626+
627+
sebastian_stan:
628+
image_url: /img/blog/authors/sebastian-eqt.png
629+
job_title: Data Engineer
630+
name: Sebastian Stan
631+
organization: EQT Group
632+
links:
633+
- icon: fa-linkedin
634+
url: https://www.linkedin.com/in/sebastian-lindblom/
635+
636+
filip_byrén:
637+
image_url: /img/blog/authors/filip-eqt.png
638+
job_title: VP and Software Architect
639+
name: Filip Byrén
640+
organization: EQT Group
641+
links:
642+
- icon: fa-linked
643+
url: https://www.linkedin.com/in/filip-byr%C3%A9n/

website/docs/best-practices/how-we-structure/5-semantic-layer-marts.md

Lines changed: 35 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ title: "Marts for the Semantic Layer"
33
id: "5-semantic-layer-marts"
44
---
55

6-
The Semantic Layer alters some fundamental principles of how you organize your project. Using dbt without the Semantic Layer necessitates creating the most useful combinations of your building block components into wide, denormalized marts. On the other hand, the Semantic Layer leverages MetricFlow to denormalize every possible combination of components we've encoded dynamically. As such we're better served to bring more normalized models through from the logical layer into the Semantic Layer to maximize flexibility. This section will assume familiarity with the best practices laid out in the [How we build our metrics](/best-practices/how-we-build-our-metrics/semantic-layer-1-intro) guide, so check that out first for a more hands-on introduction to the Semantic Layer.
6+
The [dbt Semantic Layer](/docs/use-dbt-semantic-layer/dbt-sl) alters some fundamental principles of how you organize your project. Using dbt without the Semantic Layer necessitates creating the most useful combinations of your building block components into wide, denormalized marts. On the other hand, the Semantic Layer leverages MetricFlow to denormalize every possible combination of components we've encoded dynamically. As such we're better served to bring more normalized models through from the logical layer into the Semantic Layer to maximize flexibility. This section will assume familiarity with the best practices laid out in the [How we build our metrics](/best-practices/how-we-build-our-metrics/semantic-layer-1-intro) guide, so check that out first for a more hands-on introduction to the Semantic Layer.
77

88
## Semantic Layer: Files and folders
99

@@ -36,6 +36,40 @@ models
3636
└── stg_supplies.yml
3737
```
3838

39+
## Semantic Layer: Where and why?
40+
41+
- 📂 **Directory structure**: Add your semantic models to `models/semantic_models` with directories corresponding to the models/marts files. This type of organization makes it easier to search and find what you can join. It also supports better maintenance and reduces repeated code.
42+
43+
<File name='models/marts/sem_orders.yml'>
44+
45+
```yaml
46+
semantic_models:
47+
- name: orders
48+
defaults:
49+
agg_time_dimension: order_date
50+
description: |
51+
Order fact table. This table’s grain is one row per order.
52+
model: ref('fct_orders')
53+
entities:
54+
- name: order_id
55+
type: primary
56+
- name: customer_id
57+
type: foreign
58+
dimensions:
59+
- name: order_date
60+
type: time
61+
type_params:
62+
time_granularity: day
63+
```
64+
</File>
65+
66+
## Naming convention
67+
68+
- 🏷️ **Semantic model names**: Use the `sem_` prefix for semantic model names, such as `sem_cloud_user_account_activity`. This follows the same pattern as other naming conventions like `fct_` for fact tables and `dim_` for dimension tables.
69+
- 🧩 **Entity names**: Don't use prefixes in Entity within the semantic model. This keeps the names clear and focused on their specific purpose without unnecessary prefixes.
70+
71+
This guidance helps you make sure your dbt project is organized, maintainable, and scalable, allowing you to take full advantage of the capabilities offered by the dbt Semantic Layer.
72+
3973
## When to make a mart
4074

4175
- ❓ If we can go directly to staging models and it's better to serve normalized models to the Semantic Layer, then when, where, and why would we make a mart?

0 commit comments

Comments
 (0)