Skip to content

Adds two feature toggles for compatibility with jenkins setups that manage their own labels and a safety check to not update the fleet during ongoing instance refresh operations#504

Open
corsmith-riot-ext wants to merge 6 commits into
jenkinsci:masterfrom
corsmith-riot-ext:compat-feature-toggles
Open

Conversation

@corsmith-riot-ext

Copy link
Copy Markdown

Hey There! We're testing out utilizing this fleet plugin for our Jenkins instance but need to have the labels be preserved when updating the state of the nodes. We also do fleet refresh operations on a regular cadence and need the fleet to not be further modified while fleet refresh operations are ongoing. This change is designed to facilitate those objectives without modifying the existing behavior of the plugin

Tests were performed via the built-in unit and integration tests with a few added tests to ensure that the new lines are hit. We also built and deployed this version of the plugin on our own instance and find that it works as intended. We would like to upstream these changes so that they can be used by the broader jenkins community.

Submitter checklist

  • Make sure you are opening from a topic/feature/bugfix branch (right side) and not your main branch!
  • Ensure that the pull request title represents the desired changelog entry
  • Please describe what you did
  • Link to relevant issues in GitHub or Jira (None)
  • Link to relevant pull requests, esp. upstream and downstream changes (None)
  • Ensure you have provided tests that demonstrate the feature works or the issue is fixed

… jenkins labels and safety during fleet refresh operations
@corsmith-riot-ext

Copy link
Copy Markdown
Author

Still a draft in-progress -- making sure unit and CI tests are clear before submitting for review.

@corsmith-riot-ext corsmith-riot-ext marked this pull request as ready for review October 1, 2025 22:00
@corsmith-riot-ext corsmith-riot-ext requested a review from a team as a code owner October 1, 2025 22:00
@BradyShober

Copy link
Copy Markdown
Contributor

Can you explain a bit further the use case for needing to preserve the original labels?

I think the option to pause modifications during instance refreshes is fine to add, but I would say I have some concern of the additional describe calls as we have found that the plugin can already easily encounter the default rate limits of the Auto Scaling service, especially if you have a few different Clouds configured.

@BradyShober

Copy link
Copy Markdown
Contributor

Actually thinking about it some more, some more detail on the instance refresh use case may be useful as well. If the reason for doing so is to ensure compliance of receiving security patches, I can offer some information on how we handle this within our environment without an instance refresh.

We have a few different Jenkins clusters, and the approach may vary depending on the needs of the specific cluster, but as a catch all each of our clusters has a job that is scheduled to run nightly that looks for nodes connected to the cluster, and then checks to see how long it has been since each was launched. If it is longer than a threshold we have defined, the node is set offline, waits for running jobs to complete, then terminates the instance which the ASG will then detect and replace with a new one.

In one of our clusters we utilize the maximum number of uses configuration, and you could also utilize the the max idle minutes configuration to more frequently replace instances.

@corsmith-riot-ext

Copy link
Copy Markdown
Author

That sounds like a reasonable option - and generally good practice to just roll out older nodes and let the orchestrator take care of the rest. I would still prefer to have the option to run instance refreshes, though - it's a native EC2 feature and since its hidden behind a toggle, if it causes issues for permissions checks it should be easy enough to disable. I'd be happy to add that extra info in the tooltip for the toggle itself.

As for the label preservation - we have a few different ways that jobs are targeted to specific nodes based on which jobs they've run previously (e.g. labeling a node as 'dirty' to guarantee we run a perforce workspace cleanup after a lengthy build) so any change to those labels runs the risk of causing builds to run on dirty or otherwise inappropriate agents.

I understand concerns around config bloat as well - but the label preservation is a hard blocker for us and we would really prefer to not have to fork the repo 😓

Thanks!

@BradyShober

Copy link
Copy Markdown
Contributor

I was doing some testing of your changes, the preserve labels setting looks good to me. I did encounter an interesting behavior with the instance refresh setting, I'll describe the steps that I took and what happened:

  1. Set up my cloud with a minimum of 1 instance, 1 executor, and with your new instance refresh setting enabled.

  2. Start an instance refresh

  3. While the instance refresh is running, queue up 2 jobs.

The plugin did wait to make any changes as expected until the refresh was completed, however I would have expected there to be 2 instances, 1 that was launched by the instance refresh, and another to handle the second job. What actually happened was that I ended up with 8 instances so once my 2 jobs ran I was left with 6 extra instances.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants