Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kueue mutation admission webhook should intercept specific resource instance only #4141

Open
3 tasks
ambersun1234 opened this issue Feb 3, 2025 · 4 comments · May be fixed by #4152
Open
3 tasks

Kueue mutation admission webhook should intercept specific resource instance only #4141

ambersun1234 opened this issue Feb 3, 2025 · 4 comments · May be fixed by #4152
Labels
kind/feature Categorizes issue or PR as related to a new feature.

Comments

@ambersun1234
Copy link

ambersun1234 commented Feb 3, 2025

What would you like to be added:
I would like to add a label to determine whether a resource instance should be intercept by Kueue's MutatingAdmissionWebhook

reference: webhook.yaml

- admissionReviewVersions:
    - v1
  clientConfig:
    service:
      name: '{{ include "kueue.fullname" . }}-webhook-service'
      namespace: '{{ .Release.Namespace }}'
      path: /validate-batch-v1-job
  failurePolicy: Fail
  name: vjob.kb.io
  objectSelector:  # <- this
    matchLabels:
      foo: bar
  rules:
    - apiGroups:
        - batch
      apiVersions:
        - v1
      operations:
        - CREATE
        - UPDATE
      resources:
        - jobs
  sideEffects: None

Why is this needed:
Currently Kueue's MutatingAdmissionWebhook select all the job from all namespaces
Kueue shouldn't intercept resource instance that is not intended to be managed by Kueue

In our case, when Kueue is installing(or bootstraping), there's a job(Minio provisioning) running at the same time
since Kueue's mutating webhook select all job, the Minio provisioning job will be intercept, but I don't want it to be intercept by Kueue

At the same moment, Kueue's webhook isn't ready, so it'll fail

But the actual root cause is incorrect setup of mutating webhook, leading webhook to intercept the unrelated resource instance

here's the simplified reproduce step(to simulate the above issue we had)

$ helm install kueue oci://us-central1-docker.pkg.dev/k8s-staging-images/charts/kueue --version="v0.10.1" --create-namespace --namespace=kueue-system && kubectl create job my-job --image=busybox
Pulled: us-central1-docker.pkg.dev/k8s-staging-images/charts/kueue:v0.10.1
Digest: sha256:68658378dc673d3142d8dba222739c1ae2d3ef6742876f0249d599d0634b94da
NAME: kueue
LAST DEPLOYED: Tue Feb  4 01:00:12 2025
NAMESPACE: kueue-system
STATUS: deployed
REVISION: 1
TEST SUITE: None
error: failed to create job: Internal error occurred: failed calling webhook "mjob.kb.io": failed to call webhook: Post "https://kueue-webhook-service.kueue-system.svc:443/mutate-batch-v1-job?timeout=10s": no endpoints available for service "kueue-webhook-service"

Completion requirements:

This enhancement requires the following artifacts:

  • Design doc
  • API change
  • Docs update

The artifacts should be linked in subsequent comments.

@ambersun1234 ambersun1234 added the kind/feature Categorizes issue or PR as related to a new feature. label Feb 3, 2025
@ambersun1234 ambersun1234 changed the title Kueue mutation admission webhook should intercept specific resource instance Kueue mutation admission webhook should intercept specific resource instance only Feb 3, 2025
@kannon92
Copy link
Contributor

kannon92 commented Feb 3, 2025

Interesting! I was actually thinking of starting a discussion about this for Kueue for this but mostly for pod integration.

For pod integration, I am leaning towards a namespace labeling approach so that webhooks could only apply if a namespace is labeled. And I would prefer to flip the Kueue webhooks to be opt-in at the namespace level rather than exclude only a small subset of a cluster.

I did not really think that this would be necessary for Jobs but I can see why this may block other operators that may rely on one-time jobs.

@kannon92
Copy link
Contributor

kannon92 commented Feb 3, 2025

So just to be clear, you have manageJobsWithoutQueueName: false and you are mainly running this due to the webhooks having issues with starting up.

@ambersun1234
Copy link
Author

I just have the default setting of Kueue(i.e. manageJobsWithoutQueueName: false)
also the Minio provisioning job is created by helm subchart, I didn't modify it

@kannon92
Copy link
Contributor

kannon92 commented Feb 5, 2025

So since you don't use manageJobsWithoutQueueName could you just look at label for queues in the job?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants