cluster-autoscaler: Try disabling anti affinity check #1246

edude03 · 2025-02-04T17:07:21Z

After changing the over-provisioning pod deployment to have a zonal anti affinity, we noted that cluster autoscaler fails to create a node for the un-schedulable over provisioning pod. This is a known issue (see below, the comment right about the change mentions this) however this is of course not the behaviour we want.

While this isn't a proper fix, it should "force" cluster-autoscaler to provision new nodes giving us the extra capacity we hope to gain from the over-provisioning pods (which will then of course scheduling the over-provisioning pods)

cluster-autoscaler/ca.patch

sharnoff

Left a few thoughts, general approach is ok, as discussed.

Please also prefix your PR title with cluster-autoscaler:

cluster-autoscaler/ca.patch

sharnoff · 2025-02-07T23:08:40Z

cluster-autoscaler/ca.patch

+ 		var err error
+ 		var remainingPods []*apiv1.Pod
+
+		klog.V(4).Infof("trying to schedule %d pods on existing nodes", len(podsEquivalenceGroup.Pods))


Let's remove the print debugging?

Honestly, I'd prefer to keep it in unless you strongly think we should take it out. If we run into issue with this again it'd be a lot easier to debug with the debugging messages than of course, adding them back and the back and forth that entails. Though - we default to v=4 I believe so I'd be happy to give it a higher verbosity (5?)

In general, I'd like to minimize the size of our patch, to avoid conflicts when updating.

So in this case, if you think the debugging is useful, let's keep it in but revisit in 1-3 months to see if we can get rid of it?

cluster-autoscaler/ca.patch

@mikhail-sakhnov

Simplifies our setup at the expense of longer image build times. Without pushing cluster-autoscaler anywhere, it's hard to test changes from a PR. cc @mikhail-sakhnov re: #1138, @edude03 re: #1246 Co-authored-by: Michael Francis <[email protected]>

sharnoff

lgtm, pending final unresolved comment

sharnoff · 2025-02-11T17:23:49Z

cluster-autoscaler/ca.patch

+ 	for _, eg := range podEquivalenceGroups {
+ 		samplePod := eg.Pods[0]
+-		if err := o.autoscalingContext.PredicateChecker.CheckPredicates(o.autoscalingContext.ClusterSnapshot, samplePod, nodeInfo.Node().Name); err == nil {
+		err := o.autoscalingContext.PredicateChecker.CheckPredicates(o.autoscalingContext.ClusterSnapshot, samplePod, nodeInfo.Node().Name)


Let's break this into multiple lines?

edude03 added 3 commits February 4, 2025 11:58

Try disabling anti affinity check

7764508

working patch for zonal nodegroup issue

49b826b

Don't drop the listers patch

acffe55

edude03 commented Feb 6, 2025

View reviewed changes

cluster-autoscaler/ca.patch Outdated Show resolved Hide resolved

sharnoff self-assigned this Feb 7, 2025

sharnoff reviewed Feb 7, 2025

View reviewed changes

sharnoff assigned edude03 and unassigned sharnoff Feb 7, 2025

sharnoff mentioned this pull request Feb 7, 2025

ci: Always build and push cluster-autoscaler #1249

Merged

Seperate out the check, match based on error message

47f5fe5

edude03 changed the title ~~Try disabling anti affinity check~~ cluster-autoscaler: Try disabling anti affinity check Feb 10, 2025

Merge branch 'main' into try-disable-antiaffinity-check

96732b1

sharnoff approved these changes Feb 10, 2025

View reviewed changes

Fix logic error

275293f

sharnoff approved these changes Feb 11, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cluster-autoscaler: Try disabling anti affinity check #1246

cluster-autoscaler: Try disabling anti affinity check #1246

edude03 commented Feb 4, 2025 •

edited

Loading

sharnoff left a comment

sharnoff Feb 7, 2025

edude03 Feb 10, 2025

sharnoff Feb 10, 2025

sharnoff left a comment

sharnoff Feb 11, 2025

cluster-autoscaler: Try disabling anti affinity check #1246

Are you sure you want to change the base?

cluster-autoscaler: Try disabling anti affinity check #1246

Conversation

edude03 commented Feb 4, 2025 • edited Loading

sharnoff left a comment

Choose a reason for hiding this comment

sharnoff Feb 7, 2025

Choose a reason for hiding this comment

edude03 Feb 10, 2025

Choose a reason for hiding this comment

sharnoff Feb 10, 2025

Choose a reason for hiding this comment

sharnoff left a comment

Choose a reason for hiding this comment

sharnoff Feb 11, 2025

Choose a reason for hiding this comment

edude03 commented Feb 4, 2025 •

edited

Loading