-
Notifications
You must be signed in to change notification settings - Fork 465
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
compute/correction-v2: introduce a Stage to speed up small inserts #31401
Conversation
c6fccd5
to
c696ece
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation looks very similar to a ConsolidatingContainerBuilder
, is there a reason why it couldn't reuse the implementation? It has a slightly different API, but maybe it doesn't need to. In Timely, a good rule-of-thumb is that during an operator invocation, it's great to leave state across input batches, but once the operator ceases to work, it's better to finish pending work. The builder API gives you this in the form of the extract
and finish
API, but it's harder to track this with the APIs implemented by Stage
.
Following the builder pattern, the Stage
wouldn't need to hook into logic to account for size and be aware of frontiers.
Looks like the
Why would it be preferable to finish pending work when the operator becomes idle? Keep in mind that the correction buffer is used in an async operator which, afaick, doesn't have a good way to know when it will become idle. It just does |
This commit extends the `CorrectionV2` data structure with a `Stage` that accumulates inserted updates before they get inserted into the sorted chains. This significantly reduces the amount of chain merges that need to be done in workloads that trickle in large amounts of updates in small batches, greatly speeding up these workloads.
c696ece
to
a446c75
Compare
I tested using the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, let's take the discussion around the consolidating container builder offline!
TFTR! |
This PR extends the
CorrectionV2
data structure with aStage
that accumulates inserted updates before they get inserted into the sorted chains. This significantly reduces the amount of chain merges that need to be done in workloads that trickle in large amounts of updates in small batches, greatly speeding up these workloads.For some reason, the memory tests introduced in #31267 are such a workload. Brief testing reveals that this PR speeds up the MV hydration in the
-800mb
test, when pointed againstCorrectionV2
, by roughly an order of magnitude, from ~300s to ~30s.Note that this change doesn't have much impact when running locally on macOS. For some reason the batch sizes entering the MV sink are much smaller when running in
mzcompose
, not sure why that is.Motivation
Part of https://github.com/MaterializeInc/database-issues/issues/8464
Checklist
$T ⇔ Proto$T
mapping (possibly in a backwards-incompatible way), then it is tagged with aT-proto
label.