persist-txn: support "eager" physical upper advancement on data shards #23552

danhhz · 2023-11-29T22:19:16Z

persist-txn: support "eager" physical upper advancement on data shards

When a txn commits, the "logical upper" of each data shard is advanced
but the "physical upper" is only advanced for the data shards involved
in the txn. This creates a "lazy upper" situation that is addressed on
the read side.

This commit introduces the ability to instead advance the upper of all
registered data shards, which means we can use all totally normal
unchanged read paths. The eager uppers of course don't get the
efficiency and cost benefits of lazy uppers, but they'll allow us to
de-risk the rollout to prod by breaking it into two pieces.

Touches https://github.com/MaterializeInc/database-issues/issues/6685
Touches https://github.com/MaterializeInc/database-issues/issues/6888

Motivation

This PR adds a known-desirable feature.

Tips for reviewer

Ignore the first commit, it's #23588.

Checklist

This PR has adequate test coverage / QA involvement has been duly considered.
This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
This PR includes the following user-facing behavior changes:

danhhz · 2023-12-01T00:11:39Z

There's a couple WIPs left to polish out, but this is ready for a look! I'll likely pull out the bits to turn it on in CI before merging

shepherdlybot · 2023-12-01T00:12:07Z

Mitigations

Completing required mitigations increases Resilience Coverage.

(Required) Code Review 🔍 Detected
(Required) Feature Flag
(Required) Integration Test
(Required) Observability 🔍 Detected
(Required) QA Review
Unit Test 🔍 Detected

Bug Hotspots:

What's This?

File	Percentile
../session/vars.rs	94
../src/txn_write.rs	90
../impls/stash.rs	90
../src/lib.rs	96
../src/durable.rs	94
../src/txns.rs	96
../src/lib.rs	91
../impls/persist.rs	99

src/ore/src/task.rs

jkosh44 · 2023-12-01T00:27:44Z

src/persist-txn/src/txns.rs

+        // TODO: Ideally the upper advancements would be done concurrently with
+        // this apply_le.
+        let tidy = self.apply_le(ts).await;
+


Does anything bad happen if we crash here, after the apply_le but before the advancing the physical uppers? One potentially bad scenario:

Upper of some shard s is at 5.

Apply txns at 10.

Crash.

Reboot.

Try and read s at 10.

If at some point during startup we call apply_eager_le(ts') s.t. ts' >= 10, then I think we're fine. Otherwise we might need to wait until the next commit to read the shard?

Good point, seems like we just need call apply_eager_le in init_ts?

That sounds like it would work.

Now that I think about it though, we definitely do at least one table write during bootstrap so it's probably fine as is. Might be better to be explicit about it though.

Yeah, I realized the same thing and ditto for create_collection, which would also advance them. Added it anyway :), but had to do it in register instead of init_txns.

jkosh44

LGTM assuming that the WIPs and TODOs will be resolved in a later PR

danhhz

Still have the WIP overrides in here to get CI to exercise this and still have the other PR commit (it hasn't quite merged yet), but I think I've finished polishing up the rest!

danhhz · 2023-12-01T23:03:17Z

src/persist-txn/src/txns.rs

+        // TODO: Ideally the upper advancements would be done concurrently with
+        // this apply_le.
+        let tidy = self.apply_le(ts).await;
+


Yeah, I realized the same thing and ditto for create_collection, which would also advance them. Added it anyway :), but had to do it in register instead of init_txns.

danhhz · 2023-12-01T23:04:26Z

src/storage-controller/src/lib.rs

+                tx.clone(),
+                txns,
+                lazy_persist_txn_tables,
+            );
            let txns_read = TxnsRead::start::<TxnsCodecRow>(txns_client.clone(), txns_id);


Realized last night that having this txns_read around is wasteful for eager uppers (plus having it running increases our exposure to bugs slightly), so reworked a touch here to avoid that.

When a txn commits, the "logical upper" of each data shard is advanced but the "physical upper" is only advanced for the data shards involved in the txn. This creates a "lazy upper" situation that is addressed on the read side. This commit introduces the ability to instead advance the upper of all registered data shards, which means we can use all totally normal unchanged read paths. The eager uppers of course don't get the efficiency and cost benefits of lazy uppers, but they'll allow us to derisk the rollout to prod by breaking it into two pieces.

danhhz · 2023-12-04T16:38:59Z

TFTR!

danhhz force-pushed the persist_txn_eager branch 3 times, most recently from 0d7c72b to 33edace Compare December 1, 2023 00:07

danhhz marked this pull request as ready for review December 1, 2023 00:11

danhhz requested a review from a team December 1, 2023 00:11

danhhz requested review from a team and benesch as code owners December 1, 2023 00:11

danhhz requested review from a team, maddyblue and jkosh44 December 1, 2023 00:11

jkosh44 reviewed Dec 1, 2023

View reviewed changes

jkosh44 approved these changes Dec 1, 2023

View reviewed changes

danhhz commented Dec 1, 2023

View reviewed changes

danhhz force-pushed the persist_txn_eager branch from 8d1111f to 6a1ef27 Compare December 4, 2023 16:26

danhhz force-pushed the persist_txn_eager branch from 6a1ef27 to afbdcee Compare December 4, 2023 16:31

danhhz enabled auto-merge December 4, 2023 16:39

danhhz merged commit da666aa into MaterializeInc:main Dec 4, 2023

danhhz deleted the persist_txn_eager branch December 4, 2023 17:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

persist-txn: support "eager" physical upper advancement on data shards #23552

persist-txn: support "eager" physical upper advancement on data shards #23552

danhhz commented Nov 29, 2023 •

edited

Loading

danhhz commented Dec 1, 2023

shepherdlybot bot commented Dec 1, 2023 •

edited

Loading

jkosh44 Dec 1, 2023

danhhz Dec 1, 2023

jkosh44 Dec 1, 2023

jkosh44 Dec 1, 2023

danhhz Dec 1, 2023

jkosh44 left a comment

danhhz left a comment

danhhz Dec 1, 2023

danhhz Dec 1, 2023

danhhz commented Dec 4, 2023

persist-txn: support "eager" physical upper advancement on data shards #23552

persist-txn: support "eager" physical upper advancement on data shards #23552

Conversation

danhhz commented Nov 29, 2023 • edited Loading

Motivation

Tips for reviewer

Checklist

danhhz commented Dec 1, 2023

shepherdlybot bot commented Dec 1, 2023 • edited Loading

Mitigations

jkosh44 Dec 1, 2023

Choose a reason for hiding this comment

danhhz Dec 1, 2023

Choose a reason for hiding this comment

jkosh44 Dec 1, 2023

Choose a reason for hiding this comment

jkosh44 Dec 1, 2023

Choose a reason for hiding this comment

danhhz Dec 1, 2023

Choose a reason for hiding this comment

jkosh44 left a comment

Choose a reason for hiding this comment

danhhz left a comment

Choose a reason for hiding this comment

danhhz Dec 1, 2023

Choose a reason for hiding this comment

danhhz Dec 1, 2023

Choose a reason for hiding this comment

danhhz commented Dec 4, 2023

danhhz commented Nov 29, 2023 •

edited

Loading

shepherdlybot bot commented Dec 1, 2023 •

edited

Loading