Skip to content

Infinite Loop when upgrading system-upgrade-controller due to missing ServiceAccount Annotations #361

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
przemytn opened this issue Apr 11, 2025 · 1 comment

Comments

@przemytn
Copy link

Version

v0.15.2

Platform/Architecture

linux-amd64

Describe the bug

When attempting to upgrade the system-upgrade-controller with over 300 pods, the system enters an infinite loop. This is caused by missing required Helm annotations in the ServiceAccount that prevent Helm from managing the resource properly.

To Reproduce

Deploy system-upgrade-controller ServiceAccount without proper Helm annotations
Try to upgrade using Helm with command similar to:

helm upgrade --history-max=5 --install=true --labels=catalog.cattle.io/cluster-repo-name=rancher-charts --namespace=cattle-system --reset-values=true --timeout=5m0s --values=/home/shell/helm/values-system-upgrade-controller-106.0.0.yaml --version=106.0.0 --wait=true system-upgrade-controller /home/shell/helm/system-upgrade-controller-106.0.0.tgz

Observe the error and infinite loop behavior with >300 pods

Expected behavior

The ServiceAccount should include the proper Helm annotations to allow Helm to recognize and manage it during upgrades. The upgrade process should complete normally without entering an infinite loop.
Actual behavior

Current ServiceAccount is defined as:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: system-upgrade-controller
  namespace: cattle-system

This results in the following error during upgrade:

Error: Unable to continue with install: ServiceAccount "system-upgrade-controller" in namespace "cattle-system" exists and cannot be imported into the current release: invalid ownership metadata; label validation error: missing key "app.kubernetes.io/managed-by": must be set to "Helm"; annotation validation error: missing key "meta.helm.sh/release-name": must be set to "system-upgrade-controller"; annotation validation error: missing key "meta.helm.sh/release-namespace": must be set to "cattle-system"
The system then enters an infinite loop trying to reconcile this situation, particularly problematic when there are over 300 pods in the environment.
Correct ServiceAccount should include

apiVersion: v1
kind: ServiceAccount
metadata:
  name: system-upgrade-controller
  namespace: cattle-system
  labels:
    app.kubernetes.io/managed-by: Helm
  annotations:
    meta.helm.sh/release-name: system-upgrade-controller
    meta.helm.sh/release-namespace: cattle-system

Additional context

This issue seems to be particularly severe in environments with many pods (300+). The infinite loop appears to be related to Helm's retry mechanism when it cannot properly manage existing resources due to missing annotations. Note that issues on RKE2 charts are currently disabled, so this bug report may need to be submitted through alternative channels.

@brandond
Copy link
Member

brandond commented Apr 11, 2025

What chart are you even using? We don't have charts here yet, as the PRs to add charts to this repo have not yet been merged:

The Rancher charts for system-upgrade-controller are being moved into this repo as-is; improvements to the charts can come after they've been moved here.

Opening issues for charts that aren't even in this repo yet is premature. At the moment, the recommended way to deploy the SUC is using the manifest release artifacts, as described in the README and k3s/rke2 docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants