Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Rules! for model governance (dbt Core v1.5) #332

Closed
jtcohen6 opened this issue Apr 23, 2023 · 10 comments
Closed

New Rules! for model governance (dbt Core v1.5) #332

jtcohen6 opened this issue Apr 23, 2023 · 10 comments
Labels
enhancement New feature or request Stale

Comments

@jtcohen6
Copy link

Describe the feature

https://dbt-labs.github.io/dbt-project-evaluator/0.5/rules/

I'd like to propose some New Rules, related to the model governance features that are new in v1.5.

  • Public Models Have Contracts: Every model with access: public also has contract: {enforced: true}
  • Public Models Are Documented: Every model with access: public has a top-level description and description for every column (this rule, but more)
  • Column Naming Convention: For columns in models with explicitly defined contracts, we can assert that columns' names and data types match our opinionated recommendations.
  • Exposures Public Parents: Exposures depend only on models with access: public (similar to this rule)
  • At Most Three Versions: There should be, at most, three versions of a given model: one latest, one old, one prerelease
  • Group Coverage: Calculate percentage of resources in groups. Groupable resources are everything except sources + exposures. Raise a warning about resources that aren't in groups, with an owner. (This will be a heavier lift at first; we should build tooling to make it easier!)
  • Model Access Ratio: Calculate ratios of model access: private, protected, public. My absolutely arbitrary guess at an ideal breakdown would be somewhere in the ballpark of 50/40/10%. Every group should have at least one private model.

Related to deprecation_date (dbt-labs/dbt-core#7433), coming in v1.6:

  • All old model versions (< latest_version) must have a deprecation_date
  • All models with a deprecation_date earlier than today should be removed

Very open to thoughts & feedback on all of the above — and more ideas, of course :)

Who will this benefit?

Developers & maintainers of large & complex projects, who want to start taking advantage of v1.5 features for model governance

Are you interested in contributing this feature?

I can help with refining/brainstorming, &/or possibly contribute with some of my hacktime :)

@jtcohen6 jtcohen6 added the enhancement New feature or request label Apr 23, 2023
@b-per b-per pinned this issue Apr 24, 2023
@graciegoheen
Copy link
Collaborator

Love all of these ideas and am excited to address new best practices for model contracts. Do you think these best practices fall into one of our pre-existing categories (modeling, testing, documentation, structure, performance) or does this call for a new category?

I think some of these fall neatly into modeling (such as Exposures Public Parents) - but for the others, I'm thinking about creating a new category for "governance".

@b-per
Copy link
Collaborator

b-per commented Apr 26, 2023

In my mind, most of them would fall on a new Governance category. Mostly because contracts/governance is quite opt-in and it would be great for people to activate/deactivate Governance rules if they want or not.

The one that I am less sure about is "Column Naming Convention". While it only really works when governance is in place and all columns are listed, this feels more about modelling that Governance to me.

@dave-connors-3
Copy link
Collaborator

loving this list of new no nos:

we're gonna tackle this, likely in this priority order:

Definitely:

  1. Public Models Have Contracts: for sure!
  2. Public Models Are Documented: of course! - column level checks will be a slightly new construct here, but doing a model-level check for all columns have descriptions should be doable!
  3. Exposures Public Parents: deal! organzing all of these into a Governance folder will be key here, since those who don't adopt mesh stuff might get some noise on existing exposures

Maybe:

  1. Group Coverage: We're on board here, but given how little we know about how people will use this, not sure we have a clear opinionated stance on whether you should group everything
  2. At Most Three Versions: Similar to above -- likely a good check, but until we see folks using versions, may be hard to properly capture all the use cases in a single check
  3. Column Naming Convention: this one seems hard as hell! column naming is something that we have found to be the most flexible, and while there's value in checking for a standard within a project, our recommendations are definitely not the most common ones. The signal to noise ratio might be a bit low, and this would invite a lot of requests for overrides/customization
  4. Model Access Ratio: We agree that all the have are gut checks on what the right ratios would be 😅 we may create a similar "governance coverage" model like the test and documentation coverage models we have, and this could be a good candidate!

@github-actions
Copy link
Contributor

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

@github-actions github-actions bot added the Stale label Oct 25, 2023
@b-per
Copy link
Collaborator

b-per commented Oct 25, 2023

@dave-connors-3 , you were looking at those at some points. Are you OK with closing this issue with the changes you did or do you think there is more work to be done?

@b-per b-per removed the Stale label Oct 25, 2023
@sadahry
Copy link
Contributor

sadahry commented Dec 20, 2023

How about fct_public_models_without_exposures ?

In our teams, all public models have external access, and writing the exposures are sometimes missed.
This new governance is useful to avoid the missing.

IMO, public models without exposures are rare,
it should be into dbt_project_evaluator_exceptions.csv as exception or disable the rule. (maybe wrong)

Please tell us how do you think about it ..

@b-per
Copy link
Collaborator

b-per commented Dec 20, 2023

Hi @sadahry , thanks for adding this suggestion!

I don't know if your rule would apply to a majority of use cases.

In my mind, public models are about the ability to ref() those models from different dbt groups and/or projects, which is something different from "public" in the sense that those models are consumed outside of dbt by other people and tools.

Internally we define exposures on models that are not public and have public models without exposures.

What I would recommend instead here is for you to create your own custom model and test on top of the models the package is creating (you can check the existing models and tests for inspiration) and you can then run the project evaluator package with dbt build --select package:dbt_project_evaluator+ (with the + after) to pick up the default models/tests and your custom ones

@sadahry
Copy link
Contributor

sadahry commented Dec 20, 2023

Understood.
I'll create a custom model. Thanks!

Copy link
Contributor

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

@github-actions github-actions bot added the Stale label Jun 18, 2024
Copy link
Contributor

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jun 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Stale
Projects
None yet
Development

No branches or pull requests

5 participants