-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC 28: add shrink
key to resource acquisition response
#447
Conversation
Problem: An example in RFC 28 contains invalid JSON. Fix the invalid JSON.
One thought here: I wonder if, for better backwards compatibility, it should be ok for an execution target to be in both the |
Great idea! |
Done. I've added:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks! One comment below - feel free to ignore.
spec_28.rst
Outdated
If removed (``shrink``) or down resources are assigned to a job, the | ||
scheduler SHALL NOT raise an exception on the job. The execution system | ||
takes the active role in handling failures in this case. Eventually the | ||
scheduler will receive a ``sched.free`` request for the offline resources. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should clarify that we'll send a sched.free
for resources even when they have been removed? This reads a bit ambiguous to me in that regard, but it should be required for backwards compatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. Would this be sufficient:
If removed (``shrink``) or down resources are assigned to a job, the | |
scheduler SHALL NOT raise an exception on the job. The execution system | |
takes the active role in handling failures in this case. Eventually the | |
scheduler will receive a ``sched.free`` request for the offline resources. | |
If removed (``shrink``) or down resources are assigned to a job, the | |
scheduler SHALL NOT raise an exception on the job. The execution system | |
takes the active role in handling failures in this case. Eventually the | |
scheduler will receive a ``sched.free`` request for the offline or removed resources. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made this change and will set MWP. Thanks!
Problem: There is no way for Flux to notify a scheduler that resources that have gone down are not coming back. Add a `shrink` key to the RFC 28 resource acquisition response which tells the scheduler to remove a set of resources based on execution target from consideration for scheduling.
This PR adds the
shrink
key to RFC 28 as suggested in flux-framework/flux-core#6641.This key contains an idset of execution targets which have been removed from the instance so should no longer be considered for scheduling.