Skip to content

docs: add zonal shift RFC#9010

Open
DerekFrank wants to merge 1 commit intoaws:mainfrom
DerekFrank:zonal-shift-rfc
Open

docs: add zonal shift RFC#9010
DerekFrank wants to merge 1 commit intoaws:mainfrom
DerekFrank:zonal-shift-rfc

Conversation

@DerekFrank
Copy link
Contributor

Fixes: #7271

Description

How was this change tested?

Does this change impact docs?

  • Yes, PR includes docs updates
  • Yes, issue opened: #
  • No

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@DerekFrank DerekFrank requested a review from a team as a code owner March 10, 2026 00:10
@github-actions
Copy link
Contributor

Preview deployment ready!

Preview URL: https://pr-9010.d18coufmbnnaag.amplifyapp.com

Built from commit ea5e9d69db907cfda4594a37fc6b6d63af1263ca


1. Stop provisioning capacity in the **impaired** AZ
2. Stop performing voluntary disruption in the **impaired** AZ.
3. Stop performing voluntary disruption in the **unimpaired** AZs if the disruption relies on scheduling pods to the **impaired** AZ.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would you discontinue disrupting instances in unimpaired AZs, e.g. underutilized or empty? If an application relies on infrastructure in the impaired AZ, it won't get scheduled unless your scheduling requirements are flexible. Are you worried losing capacity in the unimpaired AZs during an outage?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be preferable to stop all disruption, but that is not a hard requirement. To make that change we need integration with upstream, which we can come later as a supplement. I think there is an issue upstream for stopping disruption: kubernetes-sigs/karpenter#2497

This might make a natural addition to that

2. Stop performing voluntary disruption in the **impaired** AZ.
3. Stop performing voluntary disruption in the **unimpaired** AZs if the disruption relies on scheduling pods to the **impaired** AZ.
4. Pods with strict scheduling requirements that require capacity in the impaired AZ such as volume requirements or node affinities **should not** result in launch attempts
5. If an option is set, pods with TSCs that require capacity in the impaired AZ should instead have capacity launched into unimpaired AZs while still maintaining skew between the remaining unimpaired AZs.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If cluster topology consists of 3 zones and 1 is impaired, how will pods get scheduled in the unimpaired zones (without changing the whenUnsatisfiable to scheduleAnyway)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for Amazon Application Recovery Controller (ARC) Zonal Shift feature

2 participants