FairSharing Preemption Configuration #4173

gabesaba · 2025-02-07T16:42:24Z

What would you like to be added:
The semantics of ClusterQueue.preemption are underspecified and potentially misleading for FairSharing. As of v0.10.1, we are using borrowWithinCohort in a counter-intuitive way: rather than it limiting preemptions to a certain threshold, it guarantees that workloads below that threshold are always preempted while ignoring the fair sharing value. See #4165

Updating how FairSharing uses borrowWithinCohort to limit preemptions to a threshold results in additional complexity (see #4165 (comment)), and confusing semantics compared to Classical Preemption. While enabling borrowWithinCohort (versus only reclaimWithinCohort) in Classical Preemption results in more targets, this proposal would result in fewer targets in FairSharing.

I propose the following changes:

preemption.borrowWithinCohort is made incompatible with FairSharing. As currently stated in the documentation, only preemption.reclaimWithinCohort and preemption.withinClusterQueue are compatible with FairSharing.
preemption.reclaimWithinCohort and preemption.withinClusterQueue are extended to have a threshold priority, similarly to preemption.borrowWithinCohort
FairSharing, in addition to Classical Preemption, will respect these new thresholds. This allows FairSharing users to limit preemption of important workloads with priorities above some threshold.

Why is this needed:
Make configuration more user friendly, and more powerful for FairSharing.

Completion requirements:

This enhancement requires the following artifacts:

Design doc
API change
Docs update

The artifacts should be linked in subsequent comments.

The text was updated successfully, but these errors were encountered:

mimowo · 2025-02-10T10:31:48Z

+1, the issue with extending reclaimWithinCohort is that it is a string field so it would be a breaking change to make it a struct. We could introduce something like reclaimWithinCohortConfig, and re-design the API when upgrading to v1beta2.

tenzen-y · 2025-02-10T10:53:39Z

Basically, lgtm
In that case, what is the value of the preemption.borrowWithinCohort? IIUC, fairSharing completely covers and overlaps the preemption.borrowWithinCohort feature. So, we might be able to deprecate preemption.borrowWithinCohort then remove it in the v1beta2.

mimowo · 2025-02-10T11:07:29Z

I'm happy to consider dropping preemption.borrowWithinCohort (and I'm pretty sure @gabesaba would be happy to do so :)), but consider what we do with users who are already using it without fair sharing.

One potential obstacle is that fair sharing is global while preemption.borrowWithinCohort is per CQ. Maybe we could make fair sharing cohort-scoped, then I think this one goes away.

gabesaba added the kind/feature Categorizes issue or PR as related to a new feature. label Feb 7, 2025

gabesaba mentioned this issue Feb 7, 2025

Do not consider preemption.borrowWithinCohort in FairSharing preemptions #4165

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FairSharing Preemption Configuration #4173

FairSharing Preemption Configuration #4173

gabesaba commented Feb 7, 2025 •

edited

Loading

mimowo commented Feb 10, 2025

tenzen-y commented Feb 10, 2025

mimowo commented Feb 10, 2025

FairSharing Preemption Configuration #4173

FairSharing Preemption Configuration #4173

Comments

gabesaba commented Feb 7, 2025 • edited Loading

mimowo commented Feb 10, 2025

tenzen-y commented Feb 10, 2025

mimowo commented Feb 10, 2025

gabesaba commented Feb 7, 2025 •

edited

Loading