-
Notifications
You must be signed in to change notification settings - Fork 482
Replace materialized views #34032
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Replace materialized views #34032
Conversation
e8fb3aa to
9db9d12
Compare
test/race-condition/mzcompose.py
Outdated
|
|
||
| class ReplacementMaterializedView(Object): | ||
| def create(self) -> str: | ||
| return f'> CREATE MATERIALIZED VIEW {self.name} AS SELECT {"* FROM " + self.references.name if self.references else "'foo' AS a, 'bar' AS b"}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These nested f'{''}' are not valid in Python<=3.11, which is why the new linter fails here: https://buildkite.com/materialize/test/builds/111624#019a78c6-14eb-408e-8e44-30112c4233fb
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for adding a race condition test btw!
fb6f074 to
250ba0a
Compare
|
I'd also suggest adding an action in parallel-workload (misc/python/materialize/parallel_workload/action.py) to replace materialized views. We should also do platform-checks and explicit testdrive tests. Tell me if I should take over any of that, happy to help! (but I'm out tomorrow) |
| When applying a replacement, we need to ensure that the new schema is compatible with the existing schema. | ||
| We define compatibility as follows: | ||
| 1. The schema must be the same as the original schema, | ||
| 2. Or, the schema must be a superset of the original schema (i.e., it can add new columns but cannot remove existing ones). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be fine to remove nullable columns. Readers with the old schema can just fill in NULL for the missing columns then.
Practically speaking, we'll probably start with whatever persist schema evolution supports, which is adding nullable columns at the end. Though this feature might be a reason for wanting to extend the persist schema evolution support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for writing a design doc! Since this is adding a significant amount of new SQL surface, it would be good to fully specify that here as well. The new SQL syntax is partially described, but the MVP also mentions SHOW REPLACEMENTS, which isn't. There are likely also changes to the builtin tables required, and possibly RBAC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, should we split the design doc from the MVP implementation? Both can be reviewed and evolved independently, and having a separate PR for the design makes it more discoverable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extracted the design to #34106.
| * Provide better introspection data for replacements, such as the ability to see the differences between the current and replacement definitions. | ||
| * Surface metadata about the amount of staged changes (records, bytes) between the current and replacement definitions. | ||
| * Introspect the actual changes. | ||
| For example, which rows would be added or removed. | ||
| * Automate applying a replacement once the new definition is hydrated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Istm that to make this feature useful for users, we'll have to at least provide part of this. Specifically, users need a way to know (a) when the replacement is hydrated and caught up and (b) how many resources it roughly requires compared to the old version.
This might be as simple as telling users to check mz_frontiers and mz_arrangement_sizes, though somewhat scary because both of these relations are unstable.
|
(The build in CI has failed.) |
|
I have an overarching question about the implementation: A lot of code is duplicated between normal materialized views and "replacement materialized views". Would it be feasible to share more code between the two? |
I share your concern! I think "yes" and "no". Some code is very similar, but works on different types. For example, sequencing has the same stages minus the explain path (which I deleted just because I didn't want to implement it in the MVP), but works on different types. Other places are almost the same, and could have a common portion factored out. I did this with Combining common code would have the benefit that the two variants would not accidentally diverge if we changed one but forget about the other. For this reason I think we should try to extract communalities as reasonable, but I'd suggest to do this in a follow-up change. (I'm not a fan of follow-up changes, but seems like the easier option given the size of this change? Not 100% sure.) |
I'm wondering if we could maybe even share the types. For example, even |
| /// Validates that the given materialized view can be created. | ||
| /// | ||
| /// Shared with replacement materialized views. | ||
| pub(super) fn create_materialized_view_validate_inner( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ggevay I extracted code duplicated between the two variants into functions. It certainly unifies some of the complexity (but not all).
| } | ||
|
|
||
| // Timestamp selection | ||
| let id_bundle = dataflow_import_id_bundle(global_lir_plan.df_desc(), cluster_id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could unify this portion, too, but I think it just happens to look the same, but that might not be true forever.
|
|
||
| self.catalog_transact_with_side_effects(Some(ctx), ops, move |coord, ctx| { | ||
| Box::pin(async move { | ||
| let output_desc = global_lir_plan.desc().clone(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The side effects are very similar, but here again, I'd like to keep it explicit and not unify with creating materialized views.
c39df05 to
03e7af9
Compare
Signed-off-by: Moritz Hoffmann <[email protected]>
Signed-off-by: Moritz Hoffmann <[email protected]>
Implements the skeleton to support replacing materialized views. Specifically, the change introduces the following SQL syntax: ``` CREATE REPLACEMENT <replacement_name> FOR MATERIALIZED VIEW <view_name AS ... ``` This creates a new dataflow targeting the same shard. The dataflow selects an as-of of its inputs. The dataflow is read-only. ``` ALTER MATERIALIZED VIEW <view_name> APPLY REPLACEMENT <replacement_name> ``` Replaces the old materialized view with `<replacement_name>`. Enables writes for the replacement. The change adds convenience SQL syntax (`SHOW REPLACEMENTS`, `SHOW CREATE REPLACEMENT`), and relations to query the state of replacements. The syntax to create replacements is guarded by a feature flag that is off by default: `enable_replacement_materialized_views`. Signed-off-by: Moritz Hoffmann <[email protected]>
Signed-off-by: Moritz Hoffmann <[email protected]>
Signed-off-by: Moritz Hoffmann <[email protected]>
Implement a mechanism to replace materialized views with new definitions. This is a work-in-progress with many rough edges, and not a full implementation.
The PR adds the following syntax:
CREATE REPLACEMENT mv_replacement FOR MATERIALIZED VIEW mv AS SELECT ...DROP REPLACEMENT mv_replacementALTER MATERIALIZED VIEW mv APPLY REPLACEMENT mv_replacementThis change implements the design outlined in #34106. Part of https://github.com/MaterializeInc/database-issues/issues/9903.
Caveats
The implementation will come with some properties worth pointing out:
Aligning time
We need to take some care on cleanly cutting over from one materialized view definition to another. This is mostly a problem if the old materialized view is behind:
WITH (FORWARD TIME = true)), but I'll not implement it as part of this PR.