Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FairSharing Preemption Configuration #4173

Open
2 of 3 tasks
gabesaba opened this issue Feb 7, 2025 · 3 comments
Open
2 of 3 tasks

FairSharing Preemption Configuration #4173

gabesaba opened this issue Feb 7, 2025 · 3 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature.

Comments

@gabesaba
Copy link
Contributor

gabesaba commented Feb 7, 2025

What would you like to be added:
The semantics of ClusterQueue.preemption are underspecified and potentially misleading for FairSharing. As of v0.10.1, we are using borrowWithinCohort in a counter-intuitive way: rather than it limiting preemptions to a certain threshold, it guarantees that workloads below that threshold are always preempted while ignoring the fair sharing value. See #4165

Updating how FairSharing uses borrowWithinCohort to limit preemptions to a threshold results in additional complexity (see #4165 (comment)), and confusing semantics compared to Classical Preemption. While enabling borrowWithinCohort (versus only reclaimWithinCohort) in Classical Preemption results in more targets, this proposal would result in fewer targets in FairSharing.

I propose the following changes:

  1. preemption.borrowWithinCohort is made incompatible with FairSharing. As currently stated in the documentation, only preemption.reclaimWithinCohort and preemption.withinClusterQueue are compatible with FairSharing.
  2. preemption.reclaimWithinCohort and preemption.withinClusterQueue are extended to have a threshold priority, similarly to preemption.borrowWithinCohort
  3. FairSharing, in addition to Classical Preemption, will respect these new thresholds. This allows FairSharing users to limit preemption of important workloads with priorities above some threshold.

Why is this needed:
Make configuration more user friendly, and more powerful for FairSharing.

Completion requirements:

This enhancement requires the following artifacts:

  • Design doc
  • API change
  • Docs update

The artifacts should be linked in subsequent comments.

@gabesaba gabesaba added the kind/feature Categorizes issue or PR as related to a new feature. label Feb 7, 2025
@mimowo
Copy link
Contributor

mimowo commented Feb 10, 2025

+1, the issue with extending reclaimWithinCohort is that it is a string field so it would be a breaking change to make it a struct. We could introduce something like reclaimWithinCohortConfig, and re-design the API when upgrading to v1beta2.

@tenzen-y
Copy link
Member

Basically, lgtm
In that case, what is the value of the preemption.borrowWithinCohort? IIUC, fairSharing completely covers and overlaps the preemption.borrowWithinCohort feature. So, we might be able to deprecate preemption.borrowWithinCohort then remove it in the v1beta2.

@mimowo
Copy link
Contributor

mimowo commented Feb 10, 2025

I'm happy to consider dropping preemption.borrowWithinCohort (and I'm pretty sure @gabesaba would be happy to do so :)), but consider what we do with users who are already using it without fair sharing.

One potential obstacle is that fair sharing is global while preemption.borrowWithinCohort is per CQ. Maybe we could make fair sharing cohort-scoped, then I think this one goes away.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

No branches or pull requests

3 participants