Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ore: pend in wait_and_assert_finished instead of yielding once #23588

Merged
merged 1 commit into from
Dec 6, 2023

Conversation

guswynn
Copy link
Contributor

@guswynn guswynn commented Nov 30, 2023

A singular yield_now is not sufficient to yield a task to the tokio runtime if the JoinHandle is being selected or used in things like FuturesUnordered, so instead, we pend forever. To prevent misuse causing deadlocks, I also altered the return value of mz_ore spawn methods/functions to return a type that uses ownership to prevent users from aborting a task and also calling wait_and_assert_finished

fixes a flake described here: https://materializeinc.slack.com/archives/C01CFKM1QRF/p1701365433621259 (and elsewhere)

Motivation

  • This PR fixes a recognized bug.

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered.
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • This PR includes the following user-facing behavior changes:

src/ore/src/task.rs Outdated Show resolved Hide resolved
src/ore/src/task.rs Show resolved Hide resolved
@guswynn guswynn force-pushed the cherry-pick-for-dan branch 3 times, most recently from bc63dd1 to 4fb6158 Compare November 30, 2023 22:24
@guswynn guswynn changed the title ore: pend in wait_in_assert_finished instead ore: pend in wait_and_assert_finished instead of yielding once Nov 30, 2023
@guswynn guswynn marked this pull request as ready for review November 30, 2023 22:58
@guswynn guswynn requested a review from a team as a code owner November 30, 2023 22:58
@guswynn guswynn requested a review from a team November 30, 2023 22:58
@guswynn guswynn requested review from a team as code owners November 30, 2023 22:58
@guswynn guswynn requested a review from a team November 30, 2023 22:58
Copy link
Contributor

@def- def- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Triggered nightly, would appreciate waiting for the result: https://buildkite.com/materialize/nightlies/builds/5446

Copy link
Contributor

@benesch benesch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new API is SO good!

src/environmentd/tests/sql.rs Outdated Show resolved Hide resolved
src/persist-client/src/internal/machine.rs Outdated Show resolved Hide resolved
src/ore/src/task.rs Outdated Show resolved Hide resolved
src/ore/src/task.rs Outdated Show resolved Hide resolved
src/ore/src/task.rs Outdated Show resolved Hide resolved
src/ore/src/task.rs Outdated Show resolved Hide resolved
src/ore/src/task.rs Show resolved Hide resolved
Copy link
Contributor

@def- def- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nightly seems good except this failure: https://buildkite.com/materialize/nightlies/builds/5446#018c2275-b361-4315-b7af-0aedaa249180
After ingesting data we had a hang. I'm not sure if it's related to this PR, retriggered.

@guswynn
Copy link
Contributor Author

guswynn commented Dec 1, 2023

@def- That hang is a bit spooky but it looks like replica u2 was never started? You mentioned its on main now: #23595 so i think its fine?

@guswynn guswynn force-pushed the cherry-pick-for-dan branch from 4fb6158 to d85832b Compare December 1, 2023 17:55
@guswynn
Copy link
Contributor Author

guswynn commented Dec 1, 2023

rebased, fixed nits, re-triggered nightlies

Copy link
Contributor

@benesch benesch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

.await
.into_iter()
.map(JoinHandle::abort_on_drop)
.collect(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I defer to @danhhz, but the reason I had the explicit Drop impl was to avoid the reallocation. Could plausibly not matter at all here, but when I wrote the Drop impl I wasn't sure so I erred on the side of caution.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"The reallocation" here is a second vec? I'd say that's fine here, this gets called 1:1 with registered a reader, which is an expensive operation

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, exactly.

@danhhz
Copy link
Contributor

danhhz commented Dec 4, 2023

@guswynn Any update here ❤️? Looks like I can't get #23659 to pass without this

@guswynn
Copy link
Contributor Author

guswynn commented Dec 5, 2023

@benesch i think this is ready for another look

@benesch
Copy link
Contributor

benesch commented Dec 5, 2023

@benesch i think this is ready for another look

Has this changed since I approved here?

@guswynn guswynn merged commit a9679e2 into MaterializeInc:main Dec 6, 2023
@guswynn guswynn deleted the cherry-pick-for-dan branch December 12, 2023 17:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants