Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(5/5) [nexus] Implement Affinity/Anti-Affinity Groups in external API #7447

Open
wants to merge 82 commits into
base: main
Choose a base branch
from

Conversation

smklein
Copy link
Collaborator

@smklein smklein commented Jan 30, 2025

Pulled out of #7076

This PR is a partial implementation of RFD 522

It adds:

  • Affinity and Anti-Affinity groups, contained within projects. These groups are configured with a policy and failure domain can currently contain zero or more members. Affinity groups attempt to co-locate members, anti-affinity groups attempt to avoid co-locating members.
    • Policy describes "what to do if we cannot fulfill the co-location request". Currently, these options are "fail" (reject the request) or "allow" (continue with provisioning of the group member regardless).
    • Failure Domain describes the scope of what is considered "co-located". In this PR, the only option is "sled", but in the future, this may be expanded to e.g. "rack".
    • Members describe what can be added to affinity/anti-affinity groups. In this PR, the only option is "instance". RFD 522 describes how "anti-affinity groups may also contain affinity groups" -- which is why this "member" terminology is introduced -- but it is not yet implemented.
  • (anti-)Affinity groups are exposed by the API, through a CRUD interface
  • (anti-)Affinity groups are considered during "sled reservation", where instances are placed on a sled. This is most significantly implemented (and tested) within nexus/db-queries/src/db/datastore/sled.rs, within (4/5) [nexus] Consider Affinity/Anti-Affinity Groups during instance placement #7446

Fixes #1705

@benjaminleonard
Copy link
Contributor

Perhaps it would be useful to surface whether a member is presently satisfying an affinity request. I think I'd be interested to click into an affinity group and see its current affinity status. Or return a list of members that are currently failing to satisfy an affinity request.

Then, the next question might be; how do I fix it? Which I presume in most cases is, stop and start the instance ... which the user can do. And occasionally, reduce overall utilization / wait for software update to finish / add more capacity – which the user might not be privy to.

Base automatically changed from affinity-instance-integration to main February 24, 2025 23:07
@smklein
Copy link
Collaborator Author

smklein commented Feb 25, 2025

Perhaps it would be useful to surface whether a member is presently satisfying an affinity request. I think I'd be interested to click into an affinity group and see its current affinity status. Or return a list of members that are currently failing to satisfy an affinity request.

I filed #7614 to track this. I think it's a totally reasonable request.

Then, the next question might be; how do I fix it? Which I presume in most cases is, stop and start the instance ... which the user can do. And occasionally, reduce overall utilization / wait for software update to finish / add more capacity – which the user might not be privy to.

This is more subtle - we could also presumably automatically resolve this in some cases, by live-migrating, but doing so feels a little opinionated. This may justify an additional policy for affinity groups, beyond the "policy = allow" that we currently have -- maybe we want "policy = allow, but if we can't fulfill it, keep it where it is" vs "policy = allow, and if we can't fulfill it now, move it later".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Want ability to have anti-affinity
4 participants