Skip to content

Commit 1e307b2

Browse files
authored
Add more details to node upgrade procedure (#1586)
I got some questions around node upgrades on the community slack and figured that we should capture the answers in the migration guide. I've added some more context to the node upgrade procedure, going into more detail for the behavior of Managed Node Groups. Once this is merged in, I'll also upgrade the copy of this in the registry.
1 parent 6fd7126 commit 1e307b2

File tree

1 file changed

+40
-16
lines changed

1 file changed

+40
-16
lines changed

docs/eks-v3-migration.md

Lines changed: 40 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -69,22 +69,34 @@ Have a look at [Gracefully upgrading node groups](#gracefully-upgrading-node-gro
6969

7070
### Gracefully upgrading node groups
7171

72-
The `ManagedNodeGroup` component gracefully handles updates by default. EKS will:
73-
- boot the updated replacement nodes
74-
- cordon the old nodes to ensure no new pods get launched onto them
75-
- drain the old nodes one-by-one
76-
- shut down the empty old nodes
72+
#### Managed Node Groups (`ManagedNodeGroup`)
7773

78-
The detailed update procedure can be seen in the [AWS docs](https://docs.aws.amazon.com/eks/latest/userguide/managed-node-update-behavior.html).
74+
The `ManagedNodeGroup` component has different update behaviors depending on the type of change.
7975

80-
For self-managed node groups (i.e., the `NodeGroup` and `NodeGroupV2` components) you have two options:
76+
For regular updates (e.g., scaling, labels):
77+
* EKS will boot the updated replacement nodes
78+
* Cordon old nodes to prevent new pod scheduling
79+
* Drain all nodes in the node group simultaneously
80+
* Shut down the empty old nodes
8181

82-
1. Update the node group in place. Pulumi does this by first creating the new replacement nodes and then shutting down the old ones which will move pods to the new nodes forcibly. This is the default behavior when node groups are updated.
83-
2. Create a new node group and move your Pods to that group. Migrating to a new node group is more graceful than simply updating the node group in place. This is because the migration process taints the old node group as `NoSchedule` and drains the nodes gradually.
82+
However, for certain changes like updating the AMI type (e.g., migrating from AL2 to AL2023) in-place updates are not supported and require a full replacement:
83+
* A new node group will be created first
84+
* The old node group will be deleted after the new one is ready
85+
* EKS will drain all pods from the old node group simultaneously during deletion
8486

85-
The second option involves the following steps:
87+
Note: The detailed update procedure can be seen in the [AWS docs](https://docs.aws.amazon.com/eks/latest/userguide/managed-node-update-behavior.html). If simultaneous draining of all nodes is not desirable for your workload, you should follow the graceful migration approach described [below](#graceful-upgrade).
8688

87-
1. Create the replacement node group side-by-side with the existing node group. When doing this you need to make sure that the two node groups are allowed to communicate with each other. You can achieve this in the following way:
89+
#### Self-Managed Node Groups (`NodeGroup` and `NodeGroupV2`)
90+
91+
For self-managed node groups (i.e., the `NodeGroup` and `NodeGroupV2` components) Pulumi updates the node group in place. Pulumi does this by first creating the new replacement nodes and then shutting down the old ones which will move pods to the new nodes forcibly. This is the default behavior when node groups are updated.
92+
93+
Note: If you want to migrate to a new node group more gracefully, you can follow the steps below.
94+
95+
#### Graceful Upgrade
96+
97+
You can gracefully update your node groups by creating a new node group side-by-side with the existing node group and then draining the old node group gradually. This involves the following steps:
98+
99+
1. Create the replacement node group side-by-side with the existing node group. For self-managed node groups you need to make sure that the two node groups are allowed to communicate with each other. You can achieve this in the following way:
88100

89101
```ts
90102
const oldNG = new eks.NodeGroupV2("old", {
@@ -117,12 +129,24 @@ const newToOld = new aws.vpc.SecurityGroupIngressRule("newToOld", {
117129
});
118130
```
119131

120-
2. Find the nodes of the old node group. First take a note of the name of the auto scaling group associated with that node group and then run the following AWS CLI command, replacing `$ASG_GROUP_NAME` with the actual name of the auto scaling group:
132+
2. Find the nodes of the old node group.
121133

122-
```bash
123-
aws ec2 describe-instances --filter "Name=tag:aws:autoscaling:groupName,Values=$ASG_GROUP_NAME" \
124-
| jq -r '.Reservations[].Instances[].PrivateDnsName'
125-
```
134+
**For Managed Node Groups:**
135+
136+
Take a note of the node group name and then run the following kubectl command, replacing `$NODE_GROUP_NAME` with the actual name of the node group:
137+
138+
```bash
139+
kubectl get nodes -l eks.amazonaws.com/nodegroup=$NODE_GROUP_NAME -o jsonpath='{range .items[*]}{.metadata.name}{"\n"}{end}'
140+
```
141+
142+
**For Self-Managed Node Groups:**
143+
144+
Take a note of the name of the auto scaling group associated with that node group and then run the following AWS CLI command, replacing `$ASG_GROUP_NAME` with the actual name of the auto scaling group:
145+
146+
```bash
147+
aws ec2 describe-instances --filter "Name=tag:aws:autoscaling:groupName,Values=$ASG_GROUP_NAME" \
148+
| jq -r '.Reservations[].Instances[].PrivateDnsName'
149+
```
126150

127151
3. Drain each of the nodes of the old node group one by one. This will mark the nodes as unschedulable and gracefully move pods to other nodes. For more information have a look at this article in the [kubernetes documentation](https://kubernetes.io/docs/tasks/administer-cluster/safely-drain-node/).
128152

0 commit comments

Comments
 (0)