-
Notifications
You must be signed in to change notification settings - Fork 40.6k
feature(scheduler): Customizable pod selection and ordering in DefaultPreemption plugin #129983
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Welcome @nickbp! |
This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Hi @nickbp. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
What if you set them as the same priority, and then they could not be preempted each other? |
The pod priority is also used by the PrioritySort plugin for the scheduling queue, so flattening the priority structure would lose that ordering as well. |
/cc @kubernetes/sig-scheduling-misc Changes like this will need a KEP instead of a single PR, but we can discuss based on this pr currently. I think the starting point of this PR is very good. We have been struggling to keep some pods from being preempted. My thought: |
@AxeZhan: GitHub didn't allow me to request PR reviews from the following users: kubernetes/sig-scheduling-misc. Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Looks like the same motivation as the preemptiontoleration plugin at our sub project, doesn't it? Although I, either way, agree to put this change since we can simplify the preemptiontoleration plugin implementation. But, I'm not sure if it requires KEP because it's no API addition, a simple change, and no impact on the default scheduler's behavior. |
In addition to pdb, PriorityClass can also be used to control which Pods can be preempted. by the way: we have another issue to track: kubernetes-sigs/scheduler-plugins#839 |
Make an API (at PDB? PriorityClass?) for this kind of behavior is another story, which we can discuss separately. We need to make sure if there is enough demand for such behavior generally before proceeding for that, especially because we already have subproject and people can still just use that now. If there's only a small demand, then probably it's enough to keep it as a subproject and improve the features there. For now, I believe we can just go ahead with this PR, that is, without KEP, and we can create a separate issue to see the reaction from users. |
/ok-to-test |
t.Fatal(err) | ||
} | ||
p, ok := pi.(*DefaultPreemption) | ||
if !ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need a test case to verify that customized EligiblePods
and OrderedPods
(different from the default one) work as expected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that's a good point, I'll write up some tests that exercise customizing these.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a set of test cases overriding EligiblePods
and OrderPods
with various conceivable versions of selection logic
|
||
// EligiblePods returns victim pods which are allowed to be preempted by the provided preemptor. | ||
// The default behavior is to allow any pods of lower priority to be preempted by any pods of higher priority. | ||
EligiblePods func(nodeInfo *framework.NodeInfo, preemptor *v1.Pod) []*framework.PodInfo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, from what I've seen so far, this PR doesn't seem to introduce any changes compared to the original logic. If that's the case, I'm curious how does it provide the 'Customizable' feature? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pass the custom functions when creating DefaultPreemption{}
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, the idea is that someone would be able to assign their own functions when constructing the plugin. This should be easier to see once I've written some unit tests to exercise it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if this has limitations, because we can only use this when working directly with DefaultPreemption{}
. It cannot be used after calling New(...) since New returns an interface. 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not customizable for me.
My thought is there should be some configuration for users to set, like NodeResourcesFitArgs
for example.
And that's why I said maybe we need a KEP at first🤔.
@sanposhiho @macsko What do you think? Can we create a new configuration without a KEP?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It cannot be used after calling New(...) since New returns an interface
You can still do type assertions. But, I'm open to a discussion how to make it extensible in a better way.
Given, like I said, we already have Evaluator.Interface
for a similar purpose, the alternative option would be having other funcs there, like:
type Interface interface {
...
// SelectVictimsOnNode finds...
//...
// Evaluator passes Interface.EligiblePodsFunc to eligiblePodsFunc.
SelectVictimsOnNode(ctx context.Context, state *framework.CycleState,
pod *v1.Pod, nodeInfo *framework.NodeInfo, pdbs []*policy.PodDisruptionBudget,
eligiblePodsFunc(nodeInfo *framework.NodeInfo, preemptor *v1.Pod) []*framework.PodInfo) ([]*v1.Pod, int, *framework.Status)
EligiblePodsFunc(nodeInfo *framework.NodeInfo, preemptor *v1.Pod) []*framework.PodInfo
}
or maybe we can just put EligiblePodsFunc and OrderPods to Evaluator
struct and add params to NewEvaluator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not customization for general users who want to use a default scheduler though, this is also customization just for a different layer of users.
In that case, I tend to we extend Evaluator.Interface here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a sort of workaround here, I just recently added a call to construct a DefaultPreemption
directly, so now there's:
New()
: Returnsframework.Plugin
, aligns withPluginFactoryWithFts
(no change)NewDefaultPreemption()
: Same arguments asNew()
, but returnsDefaultPreemption
without needing a type assertion.
How does this look?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or maybe we can just put EligiblePodsFunc and OrderPods to
Evaluator
struct and add params to NewEvaluator.
If a new interface method is added, it may also be necessary to consider the possibility of destroying the original interface. 🤔 If the original logic is not changed, perhaps changing the interface will do more harm than good. This is just my opinion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it may also be necessary to consider the possibility of destroying
Yeah, good point. So, we should avoid adding new funcs there so that we don't break user's implementation unnecessarily.
As a sort of workaround here, I just recently added a call to construct a DefaultPreemption directly,
I missed this change. The current one looks the best for me over the other ones that I mentioned above. This way makes it easier for users to replace the existing DefaultPreemption.Evaluator
as well.
@@ -206,11 +227,11 @@ func (pl *DefaultPreemption) SelectVictimsOnNode( | |||
if status := pl.fh.RunFilterPluginsWithNominatedPods(ctx, state, pod, nodeInfo); !status.IsSuccess() { | |||
return nil, 0, status | |||
} | |||
var victims []*v1.Pod | |||
var victims []*framework.PodInfo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not keep the current implementation []*v1.Pod
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had this thinking:
- Further down here, the full PodInfo is being used for
removePod()
which then callsfh.RunPrefilterExtensionRemovePod()
, so mapping to Pods to be sorted, and then mapping the sorted result back to the original PodInfos would be tricky. - The provided OrderedPods function could have a use for other information in the PodInfo anyway
|
||
// OrderedPods sorts eligible victims in-place in descending order of highest to lowest priority. | ||
// Pods at the start of the slice are less likely to be preempted. | ||
OrderedPods func(eligible []*framework.PodInfo) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will OrderPods
be better since this returns a function instead of a sorted pod slice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that sounds better, I wasn't super sure about the naming myself. Renamed here and throughout
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approach looks good
var _ frameworkruntime.PluginFactoryWithFts = New | ||
|
||
// NewDefaultPreemption initializes a new plugin and returns it. The plugin type is retained to allow modification. | ||
func NewDefaultPreemption(_ context.Context, dpArgs runtime.Object, fh framework.Handle, fts feature.Features) (*DefaultPreemption, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This func doesn't belong to the interface, so context arg can be removed if unused
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, done
return NewDefaultPreemption(ctx, dpArgs, fh, fts) | ||
} | ||
|
||
var _ frameworkruntime.PluginFactoryWithFts = New |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check is redundant. Code won't compile if the New
doesn't match frameworkruntime.PluginFactoryWithFts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's true, I'd added this as a "just in case" but the build would indeed fail elsewhere. Removed.
for _, p := range s.Victims().Pods { | ||
gotPodNames = append(gotPodNames, p.Name) | ||
} | ||
if !reflect.DeepEqual(expectPodNames, gotPodNames) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use cmp.Diff
to compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
var pods []runtime.Object | ||
pods = append(pods, test.pod) | ||
for _, pod := range test.pods { | ||
pods = append(pods, pod) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
var pods []runtime.Object | |
pods = append(pods, test.pod) | |
for _, pod := range test.pods { | |
pods = append(pods, pod) | |
} | |
pods = append([]runtime.Object{test.pod}, test.pods...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm getting a type error on this change:
default_preemption_test.go:1794:46: cannot use test.pods (variable of type []*"k8s.io/api/core/v1".Pod) as []"k8s.io/apimachinery/pkg/runtime".Object value in argument to append
Tried changing test.pods
itself to be a []runtime.Object
but the []*v1.Pod
typing is needed further down when it's passed to internalcache.NewSnapshot()
.
I'll leave it as-is for now but LMK if there's something else to try here.
At this point I'm not aware of anything else to be done here, let me know if I'm missing anything. |
Co-authored-by: Dominik Marciński <[email protected]>
Co-authored-by: Dominik Marciński <[email protected]>
…ound system pod eligibility
…mption_test.go Co-authored-by: Maciej Skoczeń <[email protected]>
Just did a rebase to latest upstream, let's see how it goes 🤞 |
Alrighty, CI is looking good again, thanks for the tip |
pkg/scheduler/framework/plugins/defaultpreemption/default_preemption.go
Outdated
Show resolved
Hide resolved
/approve @sanposhiho @macsko please have a look as well, especially to the change in runtime/registry.go to make the factory return the type. I don't see a problem with this change, but may be missing something. |
/approve |
@@ -31,11 +31,11 @@ import ( | |||
type PluginFactory = func(ctx context.Context, configuration runtime.Object, f framework.Handle) (framework.Plugin, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using generics looks like a good idea. But, can we modify this as well?, which I know isn't needed for this PR's change, but it's mainly for the convenience for other plugins.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gave this a few attempts just now and remembered that I think I'd tried this originally. Switching this to generics ends up causing typing problems around the Registry
(aka map[string]PluginFactory
) where we'd want everything to be matching types (returning framework.Plugin
) anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I guess there is a way, but no need to do in this PR anyways. I'll follow up in another PR.
…mption.go Co-authored-by: Dominik Marciński <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
LGTM label has been added. Git tree hash: e688119535c58f29430e9c9839a58ceca7c6e8f2
|
/retest |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dom4ha, macsko, nickbp, sanposhiho The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/release-note-none |
What type of PR is this?
/kind feature
What this PR does / why we need it:
The current plugin behavior is to select only based on pod priority, where any pod with higher priority can preempt any pod with lower priority. If this isn't desirable, there are two options: Reimplement this plugin from scratch, or update other users of pod priority (like scheduler queue ordering) to use something else. This introduces a way to customize the selection and ordering without needing to resort to that.
In our case, we have multiple priority classes but where only a subset of those are considered preemptible. Here's roughly how this looks:
This PR allows the preemption selection to be configured against the
DefaultPreemption
plugin, rather than needing to reimplement the plugin itself. The structure used here mimics what's currently being done for theEvaluator
, where aPreemptPod
callback may be overridden to customize what happens when performing a preemption.This PR also updates the plugin's tests to call the updated
New()
function rather than constructingDefaultPreemption
directly, so thatNew()
is now being exercised in tests.Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: