Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compute/correction-v2: introduce a Stage to speed up small inserts #31401

Merged
merged 2 commits into from
Feb 12, 2025

Conversation

teskje
Copy link
Contributor

@teskje teskje commented Feb 7, 2025

This PR extends the CorrectionV2 data structure with a Stage that accumulates inserted updates before they get inserted into the sorted chains. This significantly reduces the amount of chain merges that need to be done in workloads that trickle in large amounts of updates in small batches, greatly speeding up these workloads.

For some reason, the memory tests introduced in #31267 are such a workload. Brief testing reveals that this PR speeds up the MV hydration in the -800mb test, when pointed against CorrectionV2, by roughly an order of magnitude, from ~300s to ~30s.

Note that this change doesn't have much impact when running locally on macOS. For some reason the batch sizes entering the MV sink are much smaller when running in mzcompose, not sure why that is.

Motivation

  • This PR adds a known-desirable feature.

Part of https://github.com/MaterializeInc/database-issues/issues/8464

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

@teskje teskje force-pushed the correction_v2-stage branch from c6fccd5 to c696ece Compare February 7, 2025 17:53
@teskje teskje marked this pull request as ready for review February 8, 2025 10:42
@teskje teskje requested a review from a team as a code owner February 8, 2025 10:42
@teskje teskje requested a review from antiguru February 10, 2025 08:33
Copy link
Member

@antiguru antiguru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation looks very similar to a ConsolidatingContainerBuilder, is there a reason why it couldn't reuse the implementation? It has a slightly different API, but maybe it doesn't need to. In Timely, a good rule-of-thumb is that during an operator invocation, it's great to leave state across input batches, but once the operator ceases to work, it's better to finish pending work. The builder API gives you this in the form of the extract and finish API, but it's harder to track this with the APIs implemented by Stage.

Following the builder pattern, the Stage wouldn't need to hook into logic to account for size and be aware of frontiers.

@teskje
Copy link
Contributor Author

teskje commented Feb 10, 2025

The implementation looks very similar to a ConsolidatingContainerBuilder, is there a reason why it couldn't reuse the implementation?

Looks like the ConsolidatingContainerBuilder produces containers up to a preferred size, which makes it awkward to use in the context of the correction buffer. For example, in the current implementation if the Stage gets enough data to fill two chunks at once, it can consolidate all this data together and then build a chain out of two chunks directly. Whereas with the ConsolidatingContainerBuilder I think I would get two consolidated Vecs and to build a chain from them I'd have to merge them into one Vec and then consolidate them again. So we'd spend more time consolidating than we do now.

it's great to leave state across input batches, but once the operator ceases to work, it's better to finish pending work

Why would it be preferable to finish pending work when the operator becomes idle?

Keep in mind that the correction buffer is used in an async operator which, afaick, doesn't have a good way to know when it will become idle. It just does loop { select! { ... } } on a bunch of input channels. We could probably twiddle that control flow to do something special when we find out that non of the channels is ready, but that would make the code more complex, so we should have a good reason for doing so.

This commit extends the `CorrectionV2` data structure with a `Stage`
that accumulates inserted updates before they get inserted into the
sorted chains. This significantly reduces the amount of chain merges
that need to be done in workloads that trickle in large amounts of
updates in small batches, greatly speeding up these workloads.
@teskje
Copy link
Contributor Author

teskje commented Feb 12, 2025

I tested using the ConsolidatingContainerBuilder instead in #31456. It reduces the amount of code, which is nice. But for some reason it also has way worse performance, compared to the hand-rolled Stage implementation in this PR.

Copy link
Member

@antiguru antiguru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, let's take the discussion around the consolidating container builder offline!

@teskje
Copy link
Contributor Author

teskje commented Feb 12, 2025

TFTR!

@teskje teskje merged commit e98be02 into MaterializeInc:main Feb 12, 2025
80 checks passed
@teskje teskje deleted the correction_v2-stage branch February 12, 2025 17:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants