You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What would you like to be added:
The semantics of ClusterQueue.preemption are underspecified and potentially misleading for FairSharing. As of v0.10.1, we are using borrowWithinCohort in a counter-intuitive way: rather than it limiting preemptions to a certain threshold, it guarantees that workloads below that threshold are always preempted while ignoring the fair sharing value. See #4165
Updating how FairSharing uses borrowWithinCohort to limit preemptions to a threshold results in additional complexity (see #4165 (comment)), and confusing semantics compared to Classical Preemption. While enabling borrowWithinCohort (versus only reclaimWithinCohort) in Classical Preemption results in more targets, this proposal would result in fewer targets in FairSharing.
I propose the following changes:
preemption.borrowWithinCohort is made incompatible with FairSharing. As currently stated in the documentation, only preemption.reclaimWithinCohort and preemption.withinClusterQueue are compatible with FairSharing.
preemption.reclaimWithinCohort and preemption.withinClusterQueue are extended to have a threshold priority, similarly to preemption.borrowWithinCohort
FairSharing, in addition to Classical Preemption, will respect these new thresholds. This allows FairSharing users to limit preemption of important workloads with priorities above some threshold.
Why is this needed:
Make configuration more user friendly, and more powerful for FairSharing.
Completion requirements:
This enhancement requires the following artifacts:
Design doc
API change
Docs update
The artifacts should be linked in subsequent comments.
The text was updated successfully, but these errors were encountered:
+1, the issue with extending reclaimWithinCohort is that it is a string field so it would be a breaking change to make it a struct. We could introduce something like reclaimWithinCohortConfig, and re-design the API when upgrading to v1beta2.
Basically, lgtm
In that case, what is the value of the preemption.borrowWithinCohort? IIUC, fairSharing completely covers and overlaps the preemption.borrowWithinCohort feature. So, we might be able to deprecate preemption.borrowWithinCohort then remove it in the v1beta2.
I'm happy to consider dropping preemption.borrowWithinCohort (and I'm pretty sure @gabesaba would be happy to do so :)), but consider what we do with users who are already using it without fair sharing.
One potential obstacle is that fair sharing is global while preemption.borrowWithinCohort is per CQ. Maybe we could make fair sharing cohort-scoped, then I think this one goes away.
What would you like to be added:
The semantics of ClusterQueue.preemption are underspecified and potentially misleading for FairSharing. As of
v0.10.1
, we are usingborrowWithinCohort
in a counter-intuitive way: rather than it limiting preemptions to a certain threshold, it guarantees that workloads below that threshold are always preempted while ignoring the fair sharing value. See #4165Updating how FairSharing uses
borrowWithinCohort
to limit preemptions to a threshold results in additional complexity (see #4165 (comment)), and confusing semantics compared to Classical Preemption. While enablingborrowWithinCohort
(versus onlyreclaimWithinCohort
) in Classical Preemption results in more targets, this proposal would result in fewer targets in FairSharing.I propose the following changes:
preemption.borrowWithinCohort
is made incompatible with FairSharing. As currently stated in the documentation, onlypreemption.reclaimWithinCohort
andpreemption.withinClusterQueue
are compatible withFairSharing
.preemption.reclaimWithinCohort
andpreemption.withinClusterQueue
are extended to have a threshold priority, similarly topreemption.borrowWithinCohort
Why is this needed:
Make configuration more user friendly, and more powerful for FairSharing.
Completion requirements:
This enhancement requires the following artifacts:
The artifacts should be linked in subsequent comments.
The text was updated successfully, but these errors were encountered: