Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

persist-txn: support "eager" physical upper advancement on data shards #23552

Merged
merged 1 commit into from
Dec 4, 2023

Conversation

danhhz
Copy link
Contributor

@danhhz danhhz commented Nov 29, 2023

persist-txn: support "eager" physical upper advancement on data shards

When a txn commits, the "logical upper" of each data shard is advanced
but the "physical upper" is only advanced for the data shards involved
in the txn. This creates a "lazy upper" situation that is addressed on
the read side.

This commit introduces the ability to instead advance the upper of all
registered data shards, which means we can use all totally normal
unchanged read paths. The eager uppers of course don't get the
efficiency and cost benefits of lazy uppers, but they'll allow us to
de-risk the rollout to prod by breaking it into two pieces.

Touches https://github.com/MaterializeInc/database-issues/issues/6685
Touches https://github.com/MaterializeInc/database-issues/issues/6888

Motivation

  • This PR adds a known-desirable feature.

Tips for reviewer

Ignore the first commit, it's #23588.

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered.
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • This PR includes the following user-facing behavior changes:

@danhhz danhhz force-pushed the persist_txn_eager branch 3 times, most recently from 0d7c72b to 33edace Compare December 1, 2023 00:07
@danhhz
Copy link
Contributor Author

danhhz commented Dec 1, 2023

There's a couple WIPs left to polish out, but this is ready for a look! I'll likely pull out the bits to turn it on in CI before merging

@danhhz danhhz marked this pull request as ready for review December 1, 2023 00:11
@danhhz danhhz requested a review from a team December 1, 2023 00:11
@danhhz danhhz requested review from a team and benesch as code owners December 1, 2023 00:11
@danhhz danhhz requested review from a team, maddyblue and jkosh44 December 1, 2023 00:11
Copy link

shepherdlybot bot commented Dec 1, 2023

Risk Score:82 / 100 Bug Hotspots:8 Resilience Coverage:40%

Mitigations

Completing required mitigations increases Resilience Coverage.

  • (Required) Code Review 🔍 Detected
  • (Required) Feature Flag
  • (Required) Integration Test
  • (Required) Observability 🔍 Detected
  • (Required) QA Review
  • Unit Test 🔍 Detected
Bug Hotspots:

What's This?

File Percentile
../session/vars.rs 94
../src/txn_write.rs 90
../impls/stash.rs 90
../src/lib.rs 96
../src/durable.rs 94
../src/txns.rs 96
../src/lib.rs 91
../impls/persist.rs 99

src/ore/src/task.rs Outdated Show resolved Hide resolved
// TODO: Ideally the upper advancements would be done concurrently with
// this apply_le.
let tidy = self.apply_le(ts).await;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does anything bad happen if we crash here, after the apply_le but before the advancing the physical uppers? One potentially bad scenario:

  1. Upper of some shard s is at 5.
  2. Apply txns at 10.
  3. Crash.
  4. Reboot.
  5. Try and read s at 10.

If at some point during startup we call apply_eager_le(ts') s.t. ts' >= 10, then I think we're fine. Otherwise we might need to wait until the next commit to read the shard?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, seems like we just need call apply_eager_le in init_ts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds like it would work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I think about it though, we definitely do at least one table write during bootstrap so it's probably fine as is. Might be better to be explicit about it though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I realized the same thing and ditto for create_collection, which would also advance them. Added it anyway :), but had to do it in register instead of init_txns.

Copy link
Contributor

@jkosh44 jkosh44 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM assuming that the WIPs and TODOs will be resolved in a later PR

Copy link
Contributor Author

@danhhz danhhz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still have the WIP overrides in here to get CI to exercise this and still have the other PR commit (it hasn't quite merged yet), but I think I've finished polishing up the rest!

// TODO: Ideally the upper advancements would be done concurrently with
// this apply_le.
let tidy = self.apply_le(ts).await;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I realized the same thing and ditto for create_collection, which would also advance them. Added it anyway :), but had to do it in register instead of init_txns.

tx.clone(),
txns,
lazy_persist_txn_tables,
);
let txns_read = TxnsRead::start::<TxnsCodecRow>(txns_client.clone(), txns_id);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Realized last night that having this txns_read around is wasteful for eager uppers (plus having it running increases our exposure to bugs slightly), so reworked a touch here to avoid that.

@danhhz danhhz force-pushed the persist_txn_eager branch from 8d1111f to 6a1ef27 Compare December 4, 2023 16:26
When a txn commits, the "logical upper" of each data shard is advanced
but the "physical upper" is only advanced for the data shards involved
in the txn. This creates a "lazy upper" situation that is addressed on
the read side.

This commit introduces the ability to instead advance the upper of all
registered data shards, which means we can use all totally normal
unchanged read paths. The eager uppers of course don't get the
efficiency and cost benefits of lazy uppers, but they'll allow us to
derisk the rollout to prod by breaking it into two pieces.
@danhhz danhhz force-pushed the persist_txn_eager branch from 6a1ef27 to afbdcee Compare December 4, 2023 16:31
@danhhz
Copy link
Contributor Author

danhhz commented Dec 4, 2023

TFTR!

@danhhz danhhz enabled auto-merge December 4, 2023 16:39
@danhhz danhhz merged commit da666aa into MaterializeInc:main Dec 4, 2023
@danhhz danhhz deleted the persist_txn_eager branch December 4, 2023 17:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants