Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: do not delete workspace pod on authz errors #805

Merged
merged 3 commits into from
Feb 10, 2025

Conversation

rquitales
Copy link
Member

@rquitales rquitales commented Feb 5, 2025

Proposed Changes

This PR updates the workspace controller to run the grpc WhoAmI command before proceeding with reconciliation. This change helps surface authentication issues early in the process.

If authentication fails, the workspace pod is retained rather than deleted, as it remains in a pristine state. This avoids unnecessary pod recreation and reduces the time required to spin up a new StatefulSet pod. Additionally, it prevents excessive workspace pod churn in the event of persistent authentication failures.

Example Log Output on Authentication Failure

2025-02-05T18:37:33.029Z	INFO	Applying StatefulSet	{"controller": "workspace-controller", "controllerGroup": "auto.pulumi.com", "controllerKind": "Workspace", "Workspace": {"name":"nginx-k8s-stack","namespace":"pulumi-kubernetes-operator"}, "namespace": "pulumi-kubernetes-operator", "name": "nginx-k8s-stack", "reconcileID": "bca83f73-73aa-4004-bce0-14aa8c4bf48a", "revision": "112271", "hash": "ec3e2de049a62cdb9a133f4c141df826", "source": {"Generation":1,"ForceRequest":"","Git":{"URL":"https://github.com/pulumi/examples","Ref":"4ce6027439cc8c9cc60f5704abc7d2204a07d98e","Dir":"/kubernetes-ts-nginx","Shallow":false,"SSHPrivateKey":null,"Username":null,"Password":null,"Token":null},"Flux":null,"Local":null}}
2025-02-05T18:37:33.151Z	INFO	Connecting to workspace pod	{"controller": "workspace-controller", "controllerGroup": "auto.pulumi.com", "controllerKind": "Workspace", "Workspace": {"name":"nginx-k8s-stack","namespace":"pulumi-kubernetes-operator"}, "namespace": "pulumi-kubernetes-operator", "name": "nginx-k8s-stack", "reconcileID": "bca83f73-73aa-4004-bce0-14aa8c4bf48a", "revision": "112271", "addr": "nginx-k8s-stack-workspace.pulumi-kubernetes-operator:50051"}
2025-02-05T18:37:33.158Z	INFO	Connected to workspace pod	{"controller": "workspace-controller", "controllerGroup": "auto.pulumi.com", "controllerKind": "Workspace", "Workspace": {"name":"nginx-k8s-stack","namespace":"pulumi-kubernetes-operator"}, "namespace": "pulumi-kubernetes-operator", "name": "nginx-k8s-stack", "reconcileID": "bca83f73-73aa-4004-bce0-14aa8c4bf48a", "revision": "112271", "addr": "nginx-k8s-stack-workspace.pulumi-kubernetes-operator:50051"}
2025-02-05T18:37:33.158Z	INFO	Running whoami to ensure authentication is setup correctly with the workspace pod	{"controller": "workspace-controller", "controllerGroup": "auto.pulumi.com", "controllerKind": "Workspace", "Workspace": {"name":"nginx-k8s-stack","namespace":"pulumi-kubernetes-operator"}, "namespace": "pulumi-kubernetes-operator", "name": "nginx-k8s-stack", "reconcileID": "bca83f73-73aa-4004-bce0-14aa8c4bf48a", "revision": "112271"}
2025-02-05T18:37:33.164Z	ERROR	unable to authenticate; retaining the workspace pod to retry later	{"controller": "workspace-controller", "controllerGroup": "auto.pulumi.com", "controllerKind": "Workspace", "Workspace": {"name":"nginx-k8s-stack","namespace":"pulumi-kubernetes-operator"}, "namespace": "pulumi-kubernetes-operator", "name": "nginx-k8s-stack", "reconcileID": "bca83f73-73aa-4004-bce0-14aa8c4bf48a", "revision": "112271", "error": "rpc error: code = Unauthenticated desc = TokenReview API is unavailable"}
...
2025-02-05T18:37:33.176Z	INFO	Status updated	{"controller": "workspace-controller", "controllerGroup": "auto.pulumi.com", "controllerKind": "Workspace", "Workspace": {"name":"nginx-k8s-stack","namespace":"pulumi-kubernetes-operator"}, "namespace": "pulumi-kubernetes-operator", "name": "nginx-k8s-stack", "reconcileID": "bca83f73-73aa-4004-bce0-14aa8c4bf48a", "revision": "112336", "observedGeneration": 1, "address": "nginx-k8s-stack-workspace.pulumi-kubernetes-operator:50051", "conditions": [{"type":"Ready","status":"False","observedGeneration":1,"lastTransitionTime":"2025-02-05T18:37:12Z","reason":"AuthenticationFailed","message":"Unable to authenticate with the workspace pod."}]}

Stored Workspace Status

When the wrong Pulumi Access token is provided:

status:
  address: nginx-stack-workspace.default:50051
  conditions:
  - lastTransitionTime: "2025-02-07T23:34:13Z"
    message: Invalid access token used to authenticate with Pulumi Cloud
    observedGeneration: 1
    reason: InvalidAccessToken
    status: "False"
    type: Ready
  observedGeneration: 1

When there is a K8s auth issue:

status:
  address: nginx-stack-workspace.default:50051
  conditions:
  - lastTransitionTime: "2025-02-07T23:34:13Z"
    message: TokenReview API is unavailable
    observedGeneration: 2
    reason: Unauthenticated
    status: "False"
    type: Ready
  observedGeneration: 2

Testing

  • Added an end-to-end (e2e) test to verify that the workspace pod is not deleted when authentication fails.
  • Manually validated on a GKE cluster.

Related Issues (Optional)

Fixes: #740

@rquitales
Copy link
Member Author

rquitales commented Feb 5, 2025

@rquitales rquitales force-pushed the rquitales/authz-discard-pod-fix branch 2 times, most recently from e67cd54 to a169516 Compare February 5, 2025 19:43
@rquitales rquitales force-pushed the rquitales/authz-discard-pod-fix branch from a169516 to 7bf4d36 Compare February 6, 2025 01:36
@rquitales rquitales force-pushed the rquitales/retain-delete-ws branch from 33050e6 to 1271737 Compare February 6, 2025 18:16
@rquitales rquitales force-pushed the rquitales/authz-discard-pod-fix branch from 7bf4d36 to b348133 Compare February 6, 2025 18:16
@rquitales rquitales marked this pull request as ready for review February 6, 2025 18:30
@rquitales rquitales requested a review from EronWright February 6, 2025 18:30
@rquitales rquitales self-assigned this Feb 6, 2025
@rquitales rquitales force-pushed the rquitales/retain-delete-ws branch from bfc8e33 to ca8756c Compare February 6, 2025 20:42
@rquitales rquitales force-pushed the rquitales/retain-delete-ws branch 2 times, most recently from c0e6dc7 to 8463e76 Compare February 6, 2025 22:24
@rquitales rquitales force-pushed the rquitales/authz-discard-pod-fix branch from b348133 to 2284f6a Compare February 6, 2025 22:24
@rquitales rquitales force-pushed the rquitales/retain-delete-ws branch from 8463e76 to 10d493d Compare February 6, 2025 23:10
@rquitales rquitales force-pushed the rquitales/authz-discard-pod-fix branch from 2284f6a to 54c5d23 Compare February 6, 2025 23:10
@rquitales rquitales force-pushed the rquitales/retain-delete-ws branch from 10d493d to a77a9a3 Compare February 6, 2025 23:20
@rquitales rquitales force-pushed the rquitales/authz-discard-pod-fix branch 2 times, most recently from f0e65e8 to 165c95d Compare February 6, 2025 23:23
Base automatically changed from rquitales/retain-delete-ws to master February 6, 2025 23:28
@rquitales rquitales changed the base branch from master to rquitales/report-locked-status February 6, 2025 23:34
@rquitales rquitales force-pushed the rquitales/authz-discard-pod-fix branch from 165c95d to ce9586f Compare February 6, 2025 23:34
@rquitales rquitales force-pushed the rquitales/report-locked-status branch from 6b85dd2 to c19278f Compare February 7, 2025 23:14
@rquitales rquitales force-pushed the rquitales/authz-discard-pod-fix branch 2 times, most recently from b94b75d to 8a708b3 Compare February 8, 2025 00:06
@rquitales rquitales force-pushed the rquitales/report-locked-status branch from c19278f to a3788c0 Compare February 8, 2025 00:24
@rquitales rquitales force-pushed the rquitales/authz-discard-pod-fix branch 2 times, most recently from 1cc80cb to 2c1bbf8 Compare February 8, 2025 00:32
Base automatically changed from rquitales/report-locked-status to master February 8, 2025 00:35
@rquitales rquitales force-pushed the rquitales/authz-discard-pod-fix branch from 2c1bbf8 to 3c685f0 Compare February 8, 2025 00:36
Copy link
Contributor

@EronWright EronWright left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome

Copy link

codecov bot commented Feb 8, 2025

Codecov Report

Attention: Patch coverage is 0% with 23 lines in your changes missing coverage. Please review.

Project coverage is 51.22%. Comparing base (43cd638) to head (6c2747a).
Report is 2 commits behind head on master.

Files with missing lines Patch % Lines
...r/internal/controller/auto/workspace_controller.go 0.00% 20 Missing ⚠️
agent/pkg/server/server.go 0.00% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #805      +/-   ##
==========================================
- Coverage   51.48%   51.22%   -0.27%     
==========================================
  Files          31       31              
  Lines        4296     4318      +22     
==========================================
  Hits         2212     2212              
- Misses       1895     1917      +22     
  Partials      189      189              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@rquitales rquitales enabled auto-merge (squash) February 8, 2025 00:40
@rquitales rquitales force-pushed the rquitales/authz-discard-pod-fix branch 2 times, most recently from 0fd5e9f to b2dd7af Compare February 8, 2025 01:52
@rquitales rquitales force-pushed the rquitales/authz-discard-pod-fix branch from b2dd7af to 6c2747a Compare February 10, 2025 18:33
@rquitales rquitales merged commit fc48798 into master Feb 10, 2025
7 checks passed
@rquitales rquitales deleted the rquitales/authz-discard-pod-fix branch February 10, 2025 18:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Don't discard the workspace pod when authz is misconfigured
2 participants