Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

any form of rolling upgrade is impossible with Flux #6609

Open
garlick opened this issue Feb 5, 2025 · 6 comments
Open

any form of rolling upgrade is impossible with Flux #6609

garlick opened this issue Feb 5, 2025 · 6 comments

Comments

@garlick
Copy link
Member

garlick commented Feb 5, 2025

Problem: it's not possible to have compute nodes reboot between jobs and update to a new Flux version because flux broker versions must match exactly in a given Flux instance.

On a machine the size of El Capitan, rebooting everything at once stresses other parts of the system, so could potentially reduce the length of system downtime if Flux allowed at least consecutive releases to interoperate, e.g. for now, MAJOR.MINOR.PATCH and MAJOR.MINOR.(PATCH+1)

It's easy enough to relax the checks that occur during broker wireup. It will be a little more challenging to figure out how to check this in CI.

Related:

@wihobbs
Copy link
Member

wihobbs commented Feb 6, 2025

It's easy enough to relax the checks that occur during broker wireup. It will be a little more challenging to figure out how to check this in CI.

It's gone stale, and hasn't run for a while, which is bad on me, but perhaps we could leverage flux-test-collective to do this. In the past, binaries from previous days runs have been saved in a workspace folder, so we could develop a script that would boot yesterday's version of flux with today's brokers? We could also check if the brokers can wire up under whatever version is in /usr/bin on the system in question. Just a thought.

@garlick
Copy link
Member Author

garlick commented Feb 6, 2025

It's not quite on target IMHO to tests "today" vs "yesterday" or "today" vs "host install" if we need to guarantee consecutive releases interoperate. We might have to do something like side install the last tagged version and use that in combination with the version built in the development tree.

It hurts my brain though, trying to think how to test that. Maybe flux start could grow some options for starting different brokers on different ranks and then we could change test_under_flux to conditionally use that and then run the whole test suite through with different configurations. Somehow.

@grondo
Copy link
Contributor

grondo commented Feb 6, 2025

We could reduce the test surface by only allowing non rank 0 nodes to have a version greater than rank 0 and not the other way around.

Then, we'd get a lot of mileage out of being able to use the system broker on rank 0 and the built flux everywhere else. We'd want to ensure the test commands were using the built flux, perhaps even run them on a non rank 0 broker (simulating the typical use of a login node, though I'm not sure login nodes are part of the rolling upgrade plan)

@garlick
Copy link
Member Author

garlick commented Feb 7, 2025

We could reduce the test surface by only allowing non rank 0 nodes to have a version greater than rank 0 and not the other way around.

One wrinkle with that - if you're running a batch/alloc job in an instance that has some upgraded compute nodes, the job might end up with the newer release on its rank 0 and older on other ranks. Also if the TBON isn't flat, we can still end up with a wire-up like old - new - old - new.

@grondo
Copy link
Contributor

grondo commented Feb 7, 2025

Hm, good point. It would be best if a rolling upgrade could be coordinated so that nodes are upgraded as they become free, and are not available for scheduling until the update is complete. This would ensure new jobs after the rolling updates begin always use the newest version. This might be difficult to enforce in practice though

@grondo
Copy link
Contributor

grondo commented Feb 7, 2025

Or, if we can get dynamic property management working, perhaps the broker version could be a property assigned to a node, and on systems where rolling upgrades are enabled, the scheduler could (somehow) ensure jobs are only assigned a matching version set of brokers. (Note: the current property constraint matching can't accomplish this unfortunately.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants