Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/retries on conflict error #74

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

samuel-esp
Copy link
Collaborator

@samuel-esp samuel-esp commented Dec 14, 2024

Motivation

Give the user the possibility to choose how to handle HTTP 409 conflict errors. Such conflicts typically occur when another entity (such as an HPA, CI/CD pipeline, or manual intervention) modifies a resource just before KubeDownscaler processes it

See #68 or caas-team/py-kube-downscaler#111

Changes

  • Introduced --max-retries-on-conflict argument like in Py-Kube-Downscaler
  • Introduced GetWorkload() function to handle the use case when the downscaler needs to retrieve a single Kubernetes resource (before it was only possible to get a list of resources, i.e kubectl get deploy -n default). the old GetWorkload() was renamed to GetWorkloads() to reflect the changes
  • Introduced a new function GetResourceType() that returns the resource type (string)
  • Refactored the main loop to be able to use --max-retries-on-conflict

Tests done

  • Unit Tests

TODO

  • I've assigned myself to this PR
  • Refactored docs
  • Added more unit tests on this specific use case

@jonathan-mayer jonathan-mayer added the enhancement New feature or request label Jan 7, 2025
@jonathan-mayer jonathan-mayer linked an issue Jan 7, 2025 that may be closed by this pull request
Copy link
Member

@jonathan-mayer jonathan-mayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also just so you know, the workflows were broken for forks (thats why they were failing). We've fixed it but it will now run every workflow twice in this and the other pr. If you want you can rebase the branches with main and the errors will go away.

@samuel-esp samuel-esp force-pushed the feat/retries-409-error branch from 27c4a65 to 05b88bf Compare February 19, 2025 22:48
@samuel-esp samuel-esp marked this pull request as ready for review February 19, 2025 22:53
@samuel-esp
Copy link
Collaborator Author

@jonathan-mayer rebased and included the previous suggestions. Just a couple of things to note:

  1. Pre-Commit failed because the specific getResourceFunc for each resource has duplicate code" detected (which is true because the logic is shared across them). So, eventually, we should try to think about a strategy to tackle that
  2. Pre-Commit also suggested to lower a bit startScanning complexity. What would you refactor inside that?

@jonathan-mayer
Copy link
Member

Ill have a look over it.

  1. Pre-Commit failed because the specific getResourceFunc for each resource has duplicate code" detected (which is true because the logic is shared across them). So, eventually, we should try to think about a strategy to tackle that

I do get that it detects it as duplicate, but im not entirely sure how to avoid it. I think in theory it is avoidable by just having list and get in separate functions. Another way would also be to refactor to use the dynamic client again, although i would still like to avoid that.

  1. Pre-Commit also suggested to lower a bit startScanning complexity. What would you refactor inside that?

I think we could refactor the function called in the go routine out.

@samuel-esp
Copy link
Collaborator Author

samuel-esp commented Feb 20, 2025

  1. I perfectly agree with you, I don't see anything wrong with having them duplicate. I would nolint them
  2. I'll extract the go routine from the function

@samuel-esp
Copy link
Collaborator Author

samuel-esp commented Feb 20, 2025

@jonathan-mayer I refactored the go func out of the startScanning function however I'm not sure what could be the best name for the that. attemptScan was the best I came up with

@jonathan-mayer
Copy link
Member

jonathan-mayer commented Feb 21, 2025

@jonathan-mayer I refactored the go func out of the startScanning function however I'm not sure what could be the best name for the that. attemptScan was the best I came up with

i think the current function name should be good for now. Although i would not put a anonymous function in the attemptScan function, but instead just run the attemptScan function in a goroutine so go attemptScan(). Also just a heads up, it might take some time until i get around to reviewing this.

@samuel-esp
Copy link
Collaborator Author

Don't worry Jonathan, absolutely no rush take your time

@jonathan-mayer
Copy link
Member

jonathan-mayer commented Feb 25, 2025

I think we should change up the GetResource funcs again. They should only handle listing and then every scalableResouce should have a re-get function which regets itself from kubernetes. it could like something like this:

func (c *cronJob) Reget(clientsets *Clientsets, ctx context.Context) {
	*c, err = clientsets.Kubernetes.BatchV1().CronJobs(c.Namespace).Get(c.Name, ctx,, metav1.GetOptions{})
	if err != nil {
		// TODO handle error
	}
}

@samuel-esp
Copy link
Collaborator Author

samuel-esp commented Feb 25, 2025

I think we should change up the GetResource funcs again. They should only handle listing and then every scalableResouce should have a re-get function which regets itself from kubernetes. it could like something like this:

func (c *cronJob) Reget(clientsets *Clientsets, ctx context.Context) {
	*c, err = clientsets.Kubernetes.BatchV1().CronJobs(c.Namespace).Get(c.Name, ctx,, metav1.GetOptions{})
	if err != nil {
		// TODO handle error
	}
}

Implemented, the linter still complains about 2 things:

  • deployment/statefulset class duplication
  • RegetWorkload and its specific methods returns an interface type

I mean the first one could be true, for the second one to me it seems legit to return an interface type in that case

Copy link
Member

@jonathan-mayer jonathan-mayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looked at main and clientgo

@samuel-esp
Copy link
Collaborator Author

refactored the way you suggested, still having "duplicate" suggestion on linting. I think we can't do much about that

Copy link
Member

@jonathan-mayer jonathan-mayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be good to go

@jonathan-mayer
Copy link
Member

Oh actually add the nolints for the dupl linter suggestion, then we can merge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow for synchronous operation
2 participants