compute/correction-v2: introduce a Stage to speed up small inserts #31401

teskje · 2025-02-07T17:34:14Z

This PR extends the CorrectionV2 data structure with a Stage that accumulates inserted updates before they get inserted into the sorted chains. This significantly reduces the amount of chain merges that need to be done in workloads that trickle in large amounts of updates in small batches, greatly speeding up these workloads.

For some reason, the memory tests introduced in #31267 are such a workload. Brief testing reveals that this PR speeds up the MV hydration in the -800mb test, when pointed against CorrectionV2, by roughly an order of magnitude, from ~300s to ~30s.

Note that this change doesn't have much impact when running locally on macOS. For some reason the batch sizes entering the MV sink are much smaller when running in mzcompose, not sure why that is.

Motivation

This PR adds a known-desirable feature.

Part of https://github.com/MaterializeInc/database-issues/issues/8464

Checklist

This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

antiguru

The implementation looks very similar to a ConsolidatingContainerBuilder, is there a reason why it couldn't reuse the implementation? It has a slightly different API, but maybe it doesn't need to. In Timely, a good rule-of-thumb is that during an operator invocation, it's great to leave state across input batches, but once the operator ceases to work, it's better to finish pending work. The builder API gives you this in the form of the extract and finish API, but it's harder to track this with the APIs implemented by Stage.

Following the builder pattern, the Stage wouldn't need to hook into logic to account for size and be aware of frontiers.

teskje · 2025-02-10T09:54:17Z

The implementation looks very similar to a ConsolidatingContainerBuilder, is there a reason why it couldn't reuse the implementation?

Looks like the ConsolidatingContainerBuilder produces containers up to a preferred size, which makes it awkward to use in the context of the correction buffer. For example, in the current implementation if the Stage gets enough data to fill two chunks at once, it can consolidate all this data together and then build a chain out of two chunks directly. Whereas with the ConsolidatingContainerBuilder I think I would get two consolidated Vecs and to build a chain from them I'd have to merge them into one Vec and then consolidate them again. So we'd spend more time consolidating than we do now.

it's great to leave state across input batches, but once the operator ceases to work, it's better to finish pending work

Why would it be preferable to finish pending work when the operator becomes idle?

Keep in mind that the correction buffer is used in an async operator which, afaick, doesn't have a good way to know when it will become idle. It just does loop { select! { ... } } on a bunch of input channels. We could probably twiddle that control flow to do something special when we find out that non of the channels is ready, but that would make the code more complex, so we should have a good reason for doing so.

This commit extends the `CorrectionV2` data structure with a `Stage` that accumulates inserted updates before they get inserted into the sorted chains. This significantly reduces the amount of chain merges that need to be done in workloads that trickle in large amounts of updates in small batches, greatly speeding up these workloads.

teskje · 2025-02-12T14:11:49Z

I tested using the ConsolidatingContainerBuilder instead in #31456. It reduces the amount of code, which is nice. But for some reason it also has way worse performance, compared to the hand-rolled Stage implementation in this PR.

antiguru

LGTM, let's take the discussion around the consolidating container builder offline!

teskje · 2025-02-12T17:05:17Z

TFTR!

teskje force-pushed the correction_v2-stage branch from c6fccd5 to c696ece Compare February 7, 2025 17:53

teskje marked this pull request as ready for review February 8, 2025 10:42

teskje requested a review from a team as a code owner February 8, 2025 10:42

teskje requested a review from antiguru February 10, 2025 08:33

antiguru reviewed Feb 10, 2025

View reviewed changes

teskje added 2 commits February 11, 2025 11:51

ci: enable the v2 correction implementation

a446c75

teskje force-pushed the correction_v2-stage branch from c696ece to a446c75 Compare February 11, 2025 10:51

teskje mentioned this pull request Feb 11, 2025

compute/correction-v2: introduce a Stage to speed up small inserts (ConsolidatingContainerBuilder version) #31456

Draft

5 tasks

antiguru approved these changes Feb 12, 2025

View reviewed changes

teskje merged commit e98be02 into MaterializeInc:main Feb 12, 2025
80 checks passed

teskje deleted the correction_v2-stage branch February 12, 2025 17:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

compute/correction-v2: introduce a Stage to speed up small inserts #31401

compute/correction-v2: introduce a Stage to speed up small inserts #31401

teskje commented Feb 7, 2025 •

edited

Loading

antiguru left a comment

teskje commented Feb 10, 2025

teskje commented Feb 12, 2025

antiguru left a comment

teskje commented Feb 12, 2025

compute/correction-v2: introduce a Stage to speed up small inserts #31401

compute/correction-v2: introduce a Stage to speed up small inserts #31401

Conversation

teskje commented Feb 7, 2025 • edited Loading

Motivation

Checklist

antiguru left a comment

Choose a reason for hiding this comment

teskje commented Feb 10, 2025

teskje commented Feb 12, 2025

antiguru left a comment

Choose a reason for hiding this comment

teskje commented Feb 12, 2025

teskje commented Feb 7, 2025 •

edited

Loading