Skip to content

Workspace pod may stall during rollout #801

Closed
@EronWright

Description

@EronWright

What happened?

When the workspace spec is undeployable, e.g. due to an invalid docker image, the rollout fails. Unfortunately, attempts to update the spec aren't effective at unblocking the system, due to a limitation of StatefulSet:

https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#forced-rollback

One must manually delete the workspace pod to unblock the system.

See also:

Example

Lots of ways to trigger this:

  • an invalid pod specification (e.g. invalid image)
  • unable to fetch the program source (e.g. from git), leading to pod startup failure

Output of pulumi about

PKO v2.0.0-beta.3

Additional context

Let's keep in mind some requirements.

  1. The system should respect the pod termination grace period when updating or deleting the pod, to gracefully cancel any in-flight Pulumi operation.
  2. The system should preserve the persistent volume(s) of the workspace pod, in case the user makes use of PVs.
  3. The system should wipe the temporary directory within the pod.

A possible solution may be to use the "parallel" pod management strategy:
https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#parallel-pod-management

Contributing

Vote on this issue by adding a 👍 reaction.
To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already).

Metadata

Metadata

Assignees

Labels

kind/bugSome behavior is incorrect or out of specresolution/fixedThis issue was fixed

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions