Render ephemeral type DBT model & no freshness DBT source with test as EmptyOperator#1625
Open
okayhooni wants to merge 2 commits intoastronomer:mainfrom
Open
Render ephemeral type DBT model & no freshness DBT source with test as EmptyOperator#1625okayhooni wants to merge 2 commits intoastronomer:mainfrom
okayhooni wants to merge 2 commits intoastronomer:mainfrom
Conversation
✅ Deploy Preview for sunny-pastelito-5ecb04 canceled.
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1625 +/- ##
==========================================
- Coverage 97.43% 97.41% -0.02%
==========================================
Files 80 80
Lines 4950 4957 +7
==========================================
+ Hits 4823 4829 +6
- Misses 127 128 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
This PR is stale because it has been open for 30 days with no activity. |
|
This PR is stale because it has been open for 30 days with no activity. |
Contributor
|
hi @okayhooni, thanks a lot for this PR. Really sorry, we missed reviewing this earlier. Planning to review it now, would it be possible to resolve the conflicts on this PR, please? |
Contributor
|
@okayhooni, were you able to add unit/integration tests for these changes? |
1 task
tatiana
added a commit
that referenced
this pull request
Jan 26, 2026
#2279) Avoid consumer tasks that hang indefinitely when using `ExecutionMode.WATCHER` when the associated dbt models are either ephemeral or consist of empty SQL models that are not run by dbt. ## Context There are circumstances when there is a discrepant number of nodes in the output when we run `dbt ls` and `dbt build`, using the same selectors. In the following example (`tests/sample/dbt_project_with_empty_model`), we can observe that `dbt ls` returned two models, while the `dbt build` returned a single one: ``` $ dbt ls 10:48:32 Running with dbt=1.11.2 10:48:32 Registered adapter: postgres=1.10.0 10:48:32 Unable to do partial parsing because saved manifest not found. Starting full parse. 10:48:32 Found 2 models, 464 macros micro_dbt_project.add_row micro_dbt_project.empty_model $ dbt build 10:50:21 Running with dbt=1.11.2 10:50:21 Registered adapter: postgres=1.10.0 10:50:21 Found 2 models, 464 macros 10:50:21 10:50:21 Concurrency: 4 threads (target='dev') 10:50:21 10:50:21 1 of 1 START sql view model public.add_row ..................................... [RUN] 10:50:21 1 of 1 OK created sql view model public.add_row ................................ [CREATE VIEW in 0.06s] 10:50:21 10:50:21 Finished running 1 view model in 0 hours 0 minutes and 0.20 seconds (0.20s). 10:50:21 10:50:21 Completed successfully 10:50:21 10:50:21 Done. PASS=1 WARN=0 ERROR=0 SKIP=0 NO-OP=0 TOTAL=1 ``` So far, we observed this happening in two scenarios: 1. Ephemeral nodes (#2266) 2. If the dbt model is not executable (e.g. it is an empty SQL file), both the `dbt build` and the `dbt run` will not display it in their info logs. Until Cosmos 1.12.1, Cosmos assumed these two commands would return the same number of nodes, and we implemented the `LoadMode.MANIFEST` assuming the same. In the case of `ExecutionMode.LOCAL`, this was not a big issue, because dbt does not run when we select the particular model it's excluding: ``` $ dbt build --select empty_model 10:53:03 Running with dbt=1.11.2 10:53:03 Registered adapter: postgres=1.10.0 10:53:03 Found 2 models, 464 macros 10:53:03 Nothing to do. Try checking your model configs and model specification args ``` The downside in the case of `ExecutionMode.LOCAL` is that we waste Airflow resources by potentially parsing a dbt project that wouldn't need to be parsed in those particular tasks. The PR #1625 aims to address this. However, in the case of `ExecutionMode.WATCHER`, this became a big problem, as the behaviour caused consumer nodes representing ephemeral nodes or empty models to hang indefinitely after the producer task completed successfully. The producer task was not aware of them and would not populate XCom, whereas the consumer tasks would keep checking for updates. Closes: #2266 Closes: astronomer/oss-integrations-private#315 Closes: https://astronomer.zendesk.com/agent/tickets/87180 ## About the solution Ideally, probably, we would know upfront which nodes `dbt build` decides to execute, and we would not render them as Airflow tasks. However, I do not believe this is a simple problem, since there may be other circumstances when `dbt build` skips nodes from being executed - and any custom logic we implement in Cosmos will be affected by changes dbt Core/Fusion implements upstream. Therefore, it feels - for now - the safest solution is: - Continue adding those nodes to Cosmos - Mark them as successful, logging a specific message, if they were not actually run by the dbt command. This is identified by checking the `run_results.json` file.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
ephemeralmaterialization type that serve only as CTEs should not be rendered asDbtRunOperatortasks, as they unnecessarily occupy Airflow worker slots, even for a short period.Breaking Change?
No
Checklist