-
Notifications
You must be signed in to change notification settings - Fork 295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Support for Plain Pods with Known OwnerReferences #4106
Comments
I created a draft PR for how this would work here: #4109 I have tested this in my local env! |
I would think a KEP is worth going through on this one. There is parallel work going on to support SparkApplications from @everpeace. Is there is support for Spark do you need this feature? |
When an owner is a pod, can it ever be the case that we don't want Kueue to manage the child object? I might be missing something here, but if a pod is creating workloads then is it ever truly a owner in the sense as the other integrations? Could we remove pods from the list of integrations that we care about when checking whether the owner is Kueue managed? |
Thanks for pinging. Although it can be a bit abuse (I'm not sure, sorry), as far as I know, Kueue already have kueue/pkg/controller/jobs/pod/pod_webhook.go Lines 137 to 142 in 198d276
kind: Pod
metadata:
name: a-pod-owned-by-another-pod
annotations:
kueue.x-k8s.io/pod-suspending-parent: pod
labels:
kueue.x-k8s.io/queue-name: queue-1
ownerReferences:
- apiVersion: v1
controller: true
kind: Pod
name: a-owner-pod
uid: xxxxxx
spec:
... For spark case, in my ./bin/spark-submit \
--master k8s://https://127.0.0.1:58900 \
--deploy-mode cluster \
--name spark-pi \
--class org.apache.spark.examples.SparkPi \
...
--conf spark.kubernetes.driver.annotation.kueue.x-k8s.io/pod-group-total-count=3 \
--conf spark.kubernetes.driver.annotation.kueue.x-k8s.io/pod-group-sets='[{"count":1,"name":"driver"},{"count":2,"name":"worker","template":..omit..}]' \
--conf spark.kubernetes.driver.label.kueue.x-k8s.io/queue-name=queue-1 \
--conf spark.kubernetes.driver.label.kueue.x-k8s.io/pod-group-name=spark-pi \
--conf spark.kubernetes.driver.label.kueue.x-k8s.io/pod-group-role=driver \
...
# THIS ONE!
--conf spark.kubernetes.executor.annotation.kueue.x-k8s.io/pod-suspending-parent=pod \
--conf spark.kubernetes.executor.annotation.kueue.x-k8s.io/pod-group-total-count=3 \
--conf spark.kubernetes.executor.label.kueue.x-k8s.io/queue-name=queue-1 \
--conf spark.kubernetes.executor.label.kueue.x-k8s.io/pod-group-name=spark-pi \
--conf spark.kubernetes.executor.label.kueue.x-k8s.io/pod-group-role=executor \
local:///opt/spark/examples/jars/spark-examples.jar 5000 |
Ah interesting, thanks for pointing out the My concern with using this annotation is that I would fear (as you alluded to) that it's not entirely what it's meant for, and so if it's implementation changes down the road, but we are depending on this behavior, we could run into trouble. I tracked down the original PR where "Adjusts default suspension logic in Deployment/StatefulSet webhooks to correctly handle the case where they So maybe the annotation was actually built for this purpose? Allowing for pods that are children of another kueue managed kind to opt into management with the annotation. I'm not 100% sure though, maybe @dgrove-oss (author of original PR) could weigh in? Is the usage of the |
This is the purpose of the annotation. Currently, I would classify the annotation as an implementation mechanism not as part of the stable API for Kueue. We do need the functionality, so assuming something like it would be available in future releases is probably relatively low risk, but the details could change. Is that fair @mimowo ? |
Yes, I think we will need the functionality the annotation provides going forward. Since the original introduction to use in the pod_webhook, we also already depend on it for LWS and StatefulSet integrations. I think it is safe to assume it is going to stay. As a related side note, I was never truly convinced about its name, so we may consider changing it :), some alternative names would be |
Still, IIUC the So, my preference is to support something like There are some questions:
IIUC this is a problem only if you need to enable "Ray" integration in your cluster. Otherwise you could just leave it disabled and add the "queue-name" label at the PodTemplate level. If you need to enable the "Ray" integration in your cluster, then the way to go would be to add "queue-name" and "standalone" annotations at the PodTemplate level. Is this correct? Any opinions @tenzen-y @dgrove-oss? Would you like to drive @BenHinthorne? |
In the past we introduced some annotations without KEP, and this one would be very simple. OTOH adding a KEP never hurts so, it would be great place to discuss the remaining open questions . |
Agree |
Yeah, but I'm ok if you are up to the challenge, but I'm also ok to live with it, maybe just improve the documentation. |
Forgot that we had released it already. I'm fine with improving the docs and living with it. |
What would you like to be added:
I would like to add the ability for pods annotated with
kueue.x-k8s.io/standalone
to be suspended even if their owner is a known Kueue integration (open to suggestions one exact annotation wording).Currently, in the pod_webhook, if the pod has an ownerReference of a
Kind
known to Kueue, Kueue will not suspend this pod and a workload will never be created for it. I would like to add the ability to bypass this check, and allow the pod to be suspended and have a workload be created for it, if the user opts in via an annotation.Why is this needed:
At a high level, this would give users more flexibility to process plain pods with Kueue. For example, let's consider Spark Applications, which don't yet have native support within Kueue. In short, Spark Apps operate with a Driver pod, which then creates Executor pods. The Executor owner is the Driver pod, and thus of Kind Pod. Currently, the ownerReference check mentioned above prevents the user from being able to have Kueue process the Executors at all. If users could add the
standalone
annotation to the Executor pods, then they would be able to use Kueue to manage Spark Apps via plain pods. The same concept could extend to any CRD not yet natively integrated for Kueue, that may follow a similar pattern of Pods owning Pods.Similarly, this option would also give users the flexibility to process known workload types as plain pods, should they choose to do so. This flexibility would be helpful for users to be able to process dynamically sized jobs (at least until https://github.com/kubernetes-sigs/kueue/tree/main/keps/77-dynamically-sized-jobs#implementation-history is fully supported), as well as give them the choice if they would like gang scheduling behavior or not. For example, let's consider RayJobs. Currently, RayJobs will be processed by Kueue as a single workload, and any pods resulting from the job autoscaling aren't yet supported. If users could allow for pods owned by a RayJob to be processed as plain pods, then they could choose to process RayJobs as single pods (forgoing gang scheduling), and process any dynamically scaled pods in the same way.
Completion requirements:
This enhancement requires the following artifacts:
Will link an example PR in the comments, and wait for feedback before proceeding with any doc updates.
The artifacts should be linked in subsequent comments.
The text was updated successfully, but these errors were encountered: