-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/model too many joins #413
Feature/model too many joins #413
Conversation
…dCr/dbt-project-evaluator into feature/model-too-many-joins
The integration tests look like they're failing because, now that I've introduced this new model, some test models, and the associated seed, it is failing on some other equality tests. I assume I need to update the contents of those seed files to include what I've added. Is that the correct approach? |
Yes, this is a bit annoying, but we need to update a couple of seed files when adding new models in the This doesn't happen every day so we kept it like this for now, but we know that it is not fun to do 😄 |
Hey! How would you feel about naming the model |
@b-per Thanks for clarifying, I'll align all the seeds so the equality tests no longer fail. I think I saw a couple discrepancies that weren't just about my new model (when I tested locally in VSCode) so I'll double-check for those as well. @graciegoheen That's a good suggestion. Now that you mention it, I'm not sure why I didn't name the model that to begin with, seeing as I named both the branch and the PR "too many joins". |
Renamed the model from |
OK, so I spent a bit of time looking at the test results that need to be change but unfortunately, the new model under For example, with the new model, there is now no violation of We should try to get your local version working but I am thinking that the test might need to be updated as well. I am thinking that we might want to make the number of joins a variable, that defaults to 7, but that we could change to 3 in our |
Also, one of the seeds is the following:
This is not going to work with CI as it runs on Linux and the file separator is the Windows one. We should just maybe remove checking that specific column. This makes me think that some of the discrepancies you see when running locally might be due to you running on Windows when CI runs on Linux and we run those on MacOS at dbt Labs |
…, gitignored integration_tests/package-lock.yml
Thank you for all the suggestions. Here is a list of what I've done and the info you asked for. I did the I changed the threshold for I corrected the Contents of dbt-core 1.7.5 Duckdb
|
The fact that I haven't gotten the integration tests to pass is bugging me, so I decided to try something. I created a new branch in my repository to try and get a good, clean baseline test before making any changes. I installed the dbt-duckdb package, ran Without any modifications to any models or seeds, I am getting errors on 10 tests. I was able to resolve or explain all but one. Is it possible that some integration_tests seeds in These three have the single forward slash vs. double backward slash issue, but models and paths are the same, so I imagine they would pass circleci just fine if I leave the seeds alone:
These tests had additional rows in the table that weren't in the seed. When I added those rows to the seeds, the tests pass just fine:
This covergae test had different values in the seed vs. the table build by the model. When I changed the values in the seed, the test passes just fine:
This covergae test had different values in the seed vs. the table built by the model. When I changed the values in the seed, there is still a weird error:
After updating the
Notice the |
We saw the same rounding issue here #427 (comment) |
Good to know it wasn't just me. |
I saw there were a couple of PRs merged and my branch was out of date. I synched my branch, and it now fails the CircleCI tests. It was passing previously. |
@BradCr thanks so much for the contribution! sorry this took so long! |
@dave-connors-3 No problem, Dave. This was such a fun experience. I'm looking forward to contributing again. Thank you also @b-per and @graciegoheen for all the help and guidance! |
This is a:
Link to Issue
Closes #394
Description & motivation
Per the dbt best practices docs, models should bring together a reasonable number (typically 4 to 6) of entities or concepts (staging models, or perhaps other intermediate models) that will be joined with another similarly purposed intermediate model to generate a mart. Having too many models in our mart increases the complexity. We can join two intermediate models that each house a piece of the complexity, giving us increased readability, flexibility, testing surface area, and insight into our components.
This new model,
fct_number_of_joins
, will identify models that join from seven (7) or more other models and should be refactored into intermediate models each taking some of the joins.Integration Test Screenshot
The equality test for the new
fct_number_of_joins
model passed in the integration tests.Screenshot of the
fct_number_of_joins
model being built:I updated the documentation to explain the new model. For the screenshots I had two concerns; When I attached the screenshots they are linked to my repository (not sure if that gets automatically updated or if it's something that a dbt maintainer would have to deal with) and the screenshots are from the new Cloud Lineage diagram, so the look and feel is different from the existing screenshots.
Checklist