Skip to content

Conversation

@antiguru
Copy link
Member

@antiguru antiguru commented Nov 5, 2025

Implement a mechanism to replace materialized views with new definitions. This is a work-in-progress with many rough edges, and not a full implementation.

The PR adds the following syntax:

  • To replace a materialized view, the user creates a replacement:
    CREATE REPLACEMENT mv_replacement FOR MATERIALIZED VIEW mv AS SELECT ...
  • The user then observes hydration progress. At some point, they decide it wasn't a good change after all:
    DROP REPLACEMENT mv_replacement
  • Or, they figure it's a good change and apply it:
    ALTER MATERIALIZED VIEW mv APPLY REPLACEMENT mv_replacement

This change implements the design outlined in #34106. Part of https://github.com/MaterializeInc/database-issues/issues/9903.

Caveats

The implementation will come with some properties worth pointing out:

  • Data at a specific time before the write frontier is sealed, and we cannot change it anymore. This means that constant materialized views cannot be changed at all, and refresh-every MVs only after the current write frontier, which is the next refresh moment.

Aligning time

We need to take some care on cleanly cutting over from one materialized view definition to another. This is mostly a problem if the old materialized view is behind:

  • We can wait for the old MV to reach the as-of of the new materialized view. It is reasonable to assume that the inputs of the new materialized view aren't readable at write frontier of the old MV, for example when the cluster has no replicas. This is what this PR implements.
  • We can jump forward in time, but this introduces an interval where we surface incorrect data. This might be fine if determined by the user, but by default we should not chose this behavior.
    • We could move this feature behind an option (WITH (FORWARD TIME = true)), but I'll not implement it as part of this PR.

@antiguru antiguru force-pushed the alter_mv_2 branch 11 times, most recently from e8fb3aa to 9db9d12 Compare November 12, 2025 15:48

class ReplacementMaterializedView(Object):
def create(self) -> str:
return f'> CREATE MATERIALIZED VIEW {self.name} AS SELECT {"* FROM " + self.references.name if self.references else "'foo' AS a, 'bar' AS b"}'
Copy link
Contributor

@def- def- Nov 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These nested f'{''}' are not valid in Python<=3.11, which is why the new linter fails here: https://buildkite.com/materialize/test/builds/111624#019a78c6-14eb-408e-8e44-30112c4233fb

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for adding a race condition test btw!

@antiguru antiguru force-pushed the alter_mv_2 branch 3 times, most recently from fb6f074 to 250ba0a Compare November 12, 2025 16:30
@antiguru antiguru marked this pull request as ready for review November 12, 2025 16:30
@antiguru antiguru requested review from a team as code owners November 12, 2025 16:30
@antiguru antiguru requested a review from aljoscha November 12, 2025 16:30
@def-
Copy link
Contributor

def- commented Nov 12, 2025

I'd also suggest adding an action in parallel-workload (misc/python/materialize/parallel_workload/action.py) to replace materialized views. We should also do platform-checks and explicit testdrive tests. Tell me if I should take over any of that, happy to help! (but I'm out tomorrow)

When applying a replacement, we need to ensure that the new schema is compatible with the existing schema.
We define compatibility as follows:
1. The schema must be the same as the original schema,
2. Or, the schema must be a superset of the original schema (i.e., it can add new columns but cannot remove existing ones).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be fine to remove nullable columns. Readers with the old schema can just fill in NULL for the missing columns then.

Practically speaking, we'll probably start with whatever persist schema evolution supports, which is adding nullable columns at the end. Though this feature might be a reason for wanting to extend the persist schema evolution support.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for writing a design doc! Since this is adding a significant amount of new SQL surface, it would be good to fully specify that here as well. The new SQL syntax is partially described, but the MVP also mentions SHOW REPLACEMENTS, which isn't. There are likely also changes to the builtin tables required, and possibly RBAC.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, should we split the design doc from the MVP implementation? Both can be reviewed and evolved independently, and having a separate PR for the design makes it more discoverable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extracted the design to #34106.

Comment on lines 72 to 76
* Provide better introspection data for replacements, such as the ability to see the differences between the current and replacement definitions.
* Surface metadata about the amount of staged changes (records, bytes) between the current and replacement definitions.
* Introspect the actual changes.
For example, which rows would be added or removed.
* Automate applying a replacement once the new definition is hydrated.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Istm that to make this feature useful for users, we'll have to at least provide part of this. Specifically, users need a way to know (a) when the replacement is hydrated and caught up and (b) how many resources it roughly requires compared to the old version.

This might be as simple as telling users to check mz_frontiers and mz_arrangement_sizes, though somewhat scary because both of these relations are unstable.

@ggevay
Copy link
Contributor

ggevay commented Nov 13, 2025

(The build in CI has failed.)

@ggevay
Copy link
Contributor

ggevay commented Nov 13, 2025

I have an overarching question about the implementation: A lot of code is duplicated between normal materialized views and "replacement materialized views". Would it be feasible to share more code between the two?

@antiguru
Copy link
Member Author

I have an overarching question about the implementation: A lot of code is duplicated between normal materialized views and "replacement materialized views". Would it be feasible to share more code between the two?

I share your concern! I think "yes" and "no". Some code is very similar, but works on different types. For example, sequencing has the same stages minus the explain path (which I deleted just because I didn't want to implement it in the MVP), but works on different types. Other places are almost the same, and could have a common portion factored out.

I did this with plan_create_materialized_view, which delegates all of the non-specific planning to plan_create_materialized_view_inner, and that functions is shared by the MV and replacement planning. replace_materialized_view.rs could share some code between the two.

Combining common code would have the benefit that the two variants would not accidentally diverge if we changed one but forget about the other. For this reason I think we should try to extract communalities as reasonable, but I'd suggest to do this in a follow-up change. (I'm not a fan of follow-up changes, but seems like the easier option given the size of this change? Not 100% sure.)

@ggevay
Copy link
Contributor

ggevay commented Nov 13, 2025

but works on different types

I'm wondering if we could maybe even share the types. For example, even CreateMaterializedViewPlan and CreateReplacementMaterializedViewPlan look similar. Also, plan::MaterializedView and plan::ReplacementMaterializedView look similar. Maybe we could unify these and mark whether it's a replacement by something like a boolean flag.

/// Validates that the given materialized view can be created.
///
/// Shared with replacement materialized views.
pub(super) fn create_materialized_view_validate_inner(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ggevay I extracted code duplicated between the two variants into functions. It certainly unifies some of the complexity (but not all).

}

// Timestamp selection
let id_bundle = dataflow_import_id_bundle(global_lir_plan.df_desc(), cluster_id);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could unify this portion, too, but I think it just happens to look the same, but that might not be true forever.


self.catalog_transact_with_side_effects(Some(ctx), ops, move |coord, ctx| {
Box::pin(async move {
let output_desc = global_lir_plan.desc().clone();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The side effects are very similar, but here again, I'd like to keep it explicit and not unify with creating materialized views.

@antiguru antiguru force-pushed the alter_mv_2 branch 4 times, most recently from c39df05 to 03e7af9 Compare November 14, 2025 13:22
Signed-off-by: Moritz Hoffmann <[email protected]>
Implements the skeleton to support replacing materialized views.
Specifically, the change introduces the following SQL syntax:

```
CREATE REPLACEMENT <replacement_name>
  FOR MATERIALIZED VIEW <view_name
  AS ...
```
This creates a new dataflow targeting the same shard. The dataflow
selects an as-of of its inputs. The dataflow is read-only.

```
ALTER MATERIALIZED VIEW <view_name>
  APPLY REPLACEMENT <replacement_name>
```
Replaces the old materialized view with `<replacement_name>`. Enables
writes for the replacement.

The change adds convenience SQL syntax (`SHOW REPLACEMENTS`, `SHOW
CREATE REPLACEMENT`), and relations to query the state of replacements.

The syntax to create replacements is guarded by a feature flag that is
off by default: `enable_replacement_materialized_views`.

Signed-off-by: Moritz Hoffmann <[email protected]>
Signed-off-by: Moritz Hoffmann <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants