-
Notifications
You must be signed in to change notification settings - Fork 7
Description
What would you like to be added:
Enhance the RunWithPodSetsInfo method in the PipelineRun controller to apply labelSelector and tolerations from the PodSetInfo to the pipelinerun.spec.taskRunTemplate.podTemplate field.
Currently, the RunWithPodSetsInfo method in internal/controller/pipelinerun_controller.go only sets the spec status to empty and returns:
func (p *PipelineRun) RunWithPodSetsInfo(podSetsInfo []podset.PodSetInfo) error {
p.Spec.Status = ""
return nil
}The enhancement should:
-
Extract node scheduling information from PodSetInfo: Parse the
podSetsInfoparameter to extract:- Node selectors from the resource flavor
- Tolerations for tainted nodes
- Any additional pod template specifications
-
Apply to PipelineRun taskRunTemplate: Update the PipelineRun's
spec.taskRunTemplate.podTemplatefield with:nodeSelectorfrom the podset.PodSetInfotolerationsfrom the podset.PodSetInfo
-
Handle multiple PodSetInfo entries: When multiple PodSetInfo entries exist, apply the appropriate scheduling constraints to ensure all TaskRuns in the PipelineRun are scheduled according to the resource flavor requirements.
Why is this needed:
Currently, when Kueue admits a PipelineRun workload and assigns it to a specific resource flavor, the node scheduling information (labelSelector and tolerations) from the resource flavor is not propagated to the actual TaskRun pods. This creates a disconnect between Kueue's resource management and Tekton's pod scheduling.
This enhancement is critical for several reasons:
-
Resource Flavor Enforcement: When administrators configure ClusterQueues with specific resource flavors (e.g., GPU nodes, high-memory nodes), the PipelineRun's TaskRuns should actually run on those designated nodes. Without this, workloads might be scheduled on inappropriate nodes despite Kueue's resource allocation.
-
Node Affinity and Tolerations: Resource flavors often include node selectors and tolerations to target specific node pools (e.g.,
node-type=gpu,workload-type=build). These constraints must be applied to TaskRun pods to ensure proper scheduling. -
Multi-tenant Isolation: In multi-tenant environments, resource flavors provide isolation by directing workloads to specific node pools. This isolation is only effective if TaskRun pods respect these constraints.
-
Compliance with Kueue Design: The
RunWithPodSetsInfomethod exists specifically to allow job controllers to apply Kueue-determined scheduling constraints. The current no-op implementation defeats this purpose.
Example Impact:
# ClusterQueue with GPU resource flavor
apiVersion: kueue.x-k8s.io/v1beta1
kind: ResourceFlavor
metadata:
name: gpu-flavor
spec:
nodeLabels:
node-type: gpu
tolerations:
- key: nvidia.com/gpu
operator: Equal
value: "true"
effect: NoScheduleWithout this enhancement, a PipelineRun admitted with the gpu-flavor would not have its TaskRuns scheduled on GPU nodes, leading to resource misallocation and potential workload failures.
Completion requirements:
This enhancement requires the following artifacts:
- Design doc
- API change
- Docs update
The artifacts should be linked in subsequent comments.