Description
What happened?
When the workspace spec is undeployable, e.g. due to an invalid docker image, the rollout fails. Unfortunately, attempts to update the spec aren't effective at unblocking the system, due to a limitation of StatefulSet:
https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#forced-rollback
One must manually delete the workspace pod to unblock the system.
See also:
- StatefulSet - can't rollback from a broken state kubernetes/kubernetes#67250 (comment)
- StatefulSet does not upgrade to a newer version of manifests kubernetes/kubernetes#78007
Example
Lots of ways to trigger this:
- an invalid pod specification (e.g. invalid image)
- unable to fetch the program source (e.g. from git), leading to pod startup failure
Output of pulumi about
PKO v2.0.0-beta.3
Additional context
Let's keep in mind some requirements.
- The system should respect the pod termination grace period when updating or deleting the pod, to gracefully cancel any in-flight Pulumi operation.
- The system should preserve the persistent volume(s) of the workspace pod, in case the user makes use of PVs.
- The system should wipe the temporary directory within the pod.
A possible solution may be to use the "parallel" pod management strategy:
https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#parallel-pod-management
Contributing
Vote on this issue by adding a 👍 reaction.
To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already).