Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add parallelism, stack status filter to janitor #554

Merged
merged 1 commit into from
Jan 9, 2025

Conversation

cartermckinnon
Copy link
Member

@cartermckinnon cartermckinnon commented Jan 9, 2025

Description of changes:

This adds two features to the janitor:

  1. Parallelism to sweep multiple resources at once. We spend a lot of time waiting, and when there's a big mess to clean up, this will come in handy. Controlled with the --workers flag. Default is 1.
  2. A filter to target cleanup at stacks in a specific state. Default is all stacks (unchanged). Controlled by --stack-status.
  3. Also adds a check for nodeRoleName to avoid errors when sweeping instance profiles.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@cartermckinnon cartermckinnon force-pushed the janitor-workers branch 2 times, most recently from c526fd3 to 0b60900 Compare January 9, 2025 03:52
@cartermckinnon cartermckinnon changed the title Add parallelism to janitor, stack status filter Add parallelism, stack status filter to janitor Jan 9, 2025
var workers int
flag.IntVar(&workers, "workers", 1, "number of workers to processes resources in parallel")
var stackStatus string
flag.StringVar(&stackStatus, "stack-status", "", "only process stacks with a specific status")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:(, the DELETE_FAILED stack always need manual clean up for the VPC... which can only be deleted via console but not cli

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep that happens when the EKS-created cluster security group is left behind after the cluster is deleted. The console will delete it for you when you delete the VPC, but ec2:DeleteVpc won’t.

Our logic to sweep leaked ENIs is what prevents the SG from being left behind. We should scale down Auto nodes before deleting the cluster to make sure the SG is deleted along with the cluster IMO

@cartermckinnon cartermckinnon merged commit e53b201 into main Jan 9, 2025
7 checks passed
@cartermckinnon cartermckinnon deleted the janitor-workers branch January 9, 2025 06:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants