Custom SQL Model Materialization Schedule #3236

ravenac95 · 2025-03-11T23:52:06Z

Describe the feature you'd like to request

SQLMesh, while great, has a different view on models than we do. I think we may need to have our own custom scheduling such that models can be eventually consistent. This would allow us to spend less and also continuously provide mostly complete data.

This may also make using sqlmesh (or at least our favorite parts of sqlmesh) easier for more people and less the "all-or-nothing" system that it is right now. While simultaneously being more informative than the state of things.

Describe the solution you'd like

This is not a fully baked idea but here are some early thoughts.

Essentially this is handling scheduling of materializations in a very different way such that we support the models in the repo and models via some kind of other contribution system likely through pyoso and stored in an application database.

Each model should have some execution status:

CURRENTLY_MATERIALIZING - you're likely looking at some mix of old data + new data
MATERIALIZED (and some date) - materialized within the last 24 hours

Each model can also have a dependency status:

UP_TO_DATE - all dependencies are MATERIALIZED
PENDING - some dependencies are updating so this is likely out of date

Additionally there should be a maintenance status:

ACTIVE - this is actively maintained
ARCHIVED - you're looking at a model that's data is available but no longer should be relied on and may be deleted

At least internally we should be able to have a visibility status:

PUBLIC - Publicly visible
PRIVATE - Private only

Incremental models should all be forward only. Any breaking changes should result in a new version of that model being made should force a version bump (via ci in code, via automatic versioning for any pyoso contributions).

Backfills to incremental models can be done asynchronously.

If we treat all models like this then we maintain high availability with some things potentially changing but recalculations happening in more "eventual" way that allows for us to save money. It might end up being more complicated so I think we need to think through a few things here but I wanted to get some ideas written down.

Describe alternatives you've considered

We've currently only used sqlmesh + dbt

The text was updated successfully, but these errors were encountered:

linear · 2025-03-11T23:52:09Z

OSO-499 Custom SQL Model Materialization Schedule

ravenac95 · 2025-03-12T05:29:55Z

Another thing we should be doing is allow some models to categorize their changes by changes in a given partition. So for example, let's think about the superchain dataset. Let's say we only want to process things that have changes for a given partition. Say a model is only interested in changes for BASE only. We shouldn't trigger recalculations unnecessarily for that model if only changes occur on optimism.

github-project-automation bot added this to OSO Mar 11, 2025

github-project-automation bot moved this to Backlog in OSO Mar 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom SQL Model Materialization Schedule #3236

Custom SQL Model Materialization Schedule #3236

ravenac95 commented Mar 11, 2025

linear bot commented Mar 11, 2025

ravenac95 commented Mar 12, 2025

Custom SQL Model Materialization Schedule #3236

Custom SQL Model Materialization Schedule #3236

Comments

ravenac95 commented Mar 11, 2025

Describe the feature you'd like to request

Describe the solution you'd like

Describe alternatives you've considered

linear bot commented Mar 11, 2025

ravenac95 commented Mar 12, 2025