Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve support for multiple schedulers #47

Merged
merged 7 commits into from
Nov 30, 2023
Merged

Improve support for multiple schedulers #47

merged 7 commits into from
Nov 30, 2023

Conversation

rosa
Copy link
Member

@rosa rosa commented Nov 29, 2023

Until now, everything around the scheduler was done under the assumption that a single process would be run for this purpose. This might not be true for everyone, and in particular, it won't be true for us very soon 😅 This PR introduces two changes:

  • Lock jobs selected for scheduling and use SKIP LOCKED, to help multiple schedulers that are running at the same time not have to wait on each others' locks when deleting scheduled executions. Besides, it improves the selection of jobs to schedule and their deletion to make sure only executions that have been in fact scheduled are deleted. The previous implementation assumed all selected jobs would be correctly promoted to ready executions, but if this was not the case, they'd simply be deleted and lost. Now we only delete the ones that have been, within the same transaction.
  • Wait for a random interval before starting the scheduler's tasks. This comes from an idea from @djmb, so that, when we have multiple schedulers, they don't start all at the same time after a deploy, enqueuing All The Things and causing a thundering herd.

cc @mdkent

@rosa rosa marked this pull request as ready for review November 29, 2023 16:55
@rosa rosa force-pushed the multiple-schedulers branch 2 times, most recently from f8aba1f to 98418ed Compare November 29, 2023 17:24
rosa added 4 commits November 30, 2023 18:00
…are scheduled

This change helps multiple schedulers running at the same time not waiting on each others'
locks when deleting scheduled executions. Besides, it improves the selection of jobs to
schedule and their deletion, to make sure only executions that have been in fact scheduled
are deleted. The previous implementation assumed all selected jobs would be correctly
promoted into ready executions, but if this was not the case, they'd simply be deleted
and lost. Now we only delete the ones that have been, within the same transaction.
This comes from an idea from @djmb, so that, when we have multiple schedulers,
they don't start all at the same time after a deploy, enqueuing All The Things
and causing a thundering herd.
I had missed that we were running 5 and not 4 there, oops.
@rosa rosa force-pushed the multiple-schedulers branch 2 times, most recently from c375ca3 to b2035b4 Compare November 30, 2023 17:28
And wait until all semaphores have been signalled, as it might happen that
jobs finish and before they've signaled the semaphore, we try to wait on it,
having the test fail, if the timing aligns.
@rosa rosa force-pushed the multiple-schedulers branch from b2035b4 to 42e33d4 Compare November 30, 2023 18:32
@rosa rosa merged commit 722c09c into main Nov 30, 2023
@rosa rosa deleted the multiple-schedulers branch November 30, 2023 20:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant