Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SURE-8883] Add the options for the user to add custom or automatic tolerations to tolerations for Helm operation pods #3313

Open
kkaempf opened this issue Feb 7, 2025 · 3 comments

Comments

@kkaempf
Copy link
Collaborator

kkaempf commented Feb 7, 2025

SURE-8883

Motivation

Whenever a user does any operation on a helm chart, a helm-operation-xxx pod is created to execute the operation. Customers that have all nodes tainted will have a problem with this.

Changes were made to the backend SURE-8593 so that a customer may provide tolerations to helm operation pods or ask that it automatically adds the needed tolerations to run on control plane nodes. The changes were made to the install/upgrade/uninstall charts endpoints.

See JIRA for an example of payload

Acceptance criteria

The user can manually specify tolerations to be added to a helm operation pod when installing/upgrading/uninstalling a chart
The user can specify that the helm operation pods should add tolerations to all taints from control plane nodes.

@kkaempf kkaempf added this to the v2.11.0 milestone Feb 7, 2025
@kkaempf kkaempf added this to Fleet Feb 7, 2025
@github-project-automation github-project-automation bot moved this to 🆕 New in Fleet Feb 7, 2025
@kkaempf kkaempf added the JIRA Must shout label Feb 7, 2025
@0xavi0 0xavi0 self-assigned this Feb 7, 2025
@kkaempf
Copy link
Collaborator Author

kkaempf commented Feb 7, 2025

Tentatively adding to v2.11 so engineering can triage it and give a rough estimate.
The final schedule will be determined based on the estimate (and other backlog items 😉 )

@kkaempf kkaempf moved this from 🆕 New to To Triage in Fleet Feb 7, 2025
0xavi0 added a commit to 0xavi0/fleet that referenced this issue Feb 10, 2025
When applying a `GitRepo` in a cluster where all the nodes are tainted the pod created to call `fleet apply` remans waiting as Pending.

This PR adds tolerations to the `GitRepo` spec. Those tolerations will be added to the job spec when running `fleet apply` so the pod is scheduled.

Refers to: rancher#3313

Signed-off-by: Xavi Garcia <[email protected]>
@kkaempf kkaempf moved this from To Triage to 🏗 In progress in Fleet Feb 10, 2025
0xavi0 added a commit to 0xavi0/fleet that referenced this issue Feb 10, 2025
When applying a `GitRepo` in a cluster where all the nodes are tainted the pod created to call `fleet apply` remans waiting as Pending.

This PR adds tolerations to the `GitRepo` spec. Those tolerations will be added to the job spec when running `fleet apply` so the pod is scheduled.

Refers to: rancher#3313

Signed-off-by: Xavi Garcia <[email protected]>
0xavi0 added a commit to 0xavi0/fleet that referenced this issue Feb 11, 2025
When applying a `GitRepo` in a cluster where all the nodes are tainted the pod created to call `fleet apply` remans waiting as Pending.

This PR adds tolerations to the `GitRepo` spec. Those tolerations will be added to the job spec when running `fleet apply` so the pod is scheduled.

Refers to: rancher#3313

Signed-off-by: Xavi Garcia <[email protected]>
@manno
Copy link
Member

manno commented Feb 12, 2025

For Fleet:

  • make sure tolerations are passed to the fleet helm chart
  • make sure fleetcluster passes tolerations to local cluster
  • fleet chart uses tolerations for cleanup/etc. jobs
  • fleet controller uses tolerations for apply/clone job pods

0xavi0 added a commit to 0xavi0/fleet that referenced this issue Feb 17, 2025
…ning the fleet apply job

Adding the toleration needed to the helm chart in Fleet is not enough when running the agent and the fleet apply job.

This PR adds the tolerations found in the `fleet-controller` deployment to the agent and to the fleet apply job.

Refers to: rancher#3313

Signed-off-by: Xavi Garcia <[email protected]>
0xavi0 added a commit to 0xavi0/fleet that referenced this issue Feb 17, 2025
…ning the fleet apply job

Adding the toleration needed to the helm chart in Fleet is not enough when running the agent and the fleet apply job.

This PR adds the tolerations found in the `fleet-controller` deployment to the agent and to the fleet apply job.

Refers to: rancher#3313

Signed-off-by: Xavi Garcia <[email protected]>
0xavi0 added a commit to 0xavi0/fleet that referenced this issue Feb 17, 2025
…ning the fleet apply job

Adding the toleration needed to the helm chart in Fleet is not enough when running the agent and the fleet apply job.

This PR adds the tolerations found in the `fleet-controller` deployment to the agent and to the fleet apply job.

Refers to: rancher#3313

Signed-off-by: Xavi Garcia <[email protected]>
0xavi0 added a commit to 0xavi0/fleet that referenced this issue Feb 17, 2025
…ning the fleet apply job

Adding the toleration needed to the helm chart in Fleet is not enough when running the agent and the fleet apply job.

This PR adds the tolerations found in the `fleet-controller` deployment to the agent and to the fleet apply job.

Refers to: rancher#3313

Signed-off-by: Xavi Garcia <[email protected]>
0xavi0 added a commit that referenced this issue Feb 20, 2025
…ning the fleet apply job (#3362)

Adding the toleration needed to the helm chart in Fleet is not enough when running the agent and the fleet apply job.

This PR adds the tolerations found in the `fleet-controller` deployment to the agent and to the fleet apply job.

Refers to: #3313

Signed-off-by: Xavi Garcia <[email protected]>
0xavi0 added a commit to 0xavi0/fleet that referenced this issue Feb 20, 2025
…ning the fleet apply job (rancher#3362)

Adding the toleration needed to the helm chart in Fleet is not enough when running the agent and the fleet apply job.

This PR adds the tolerations found in the `fleet-controller` deployment to the agent and to the fleet apply job.

Refers to: rancher#3313

Signed-off-by: Xavi Garcia <[email protected]>
@gaktive
Copy link
Member

gaktive commented Feb 20, 2025

UI received something similar and has a draft to help out, but we seem to be blocked by either this or rancher/rancher#49101

Dashboard: rancher/dashboard#13125
Internal reference: SURE-9823 (blocked by SURE-8883 mentioned at the top)

@kkaempf as I help out @momesgin with his ticket, does his ticket look connected?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: 🏗 In progress
Development

No branches or pull requests

4 participants