The cluster setup installs Red Hat OpenShift AI and configures Scheduler Plugins, Kueue, cluster roles, and priority classes.
Create default-priority
, high-priority
, and low-priority
priority classes:
oc apply -f setup.RHOAI-v2.18/mlbatch-priorities.yaml
MLBatch configures Kubernetes scheduling to accomplish two objectives:
- Obtaining gang (all or nothing) scheduling for multi-Pod workloads.
- Packing Pods whose GPU request is less than the number of GPUs on a Node to maximize the number of Nodes available for Pods that request all the GPUs on a Node.
This is done by installing the Coscheduling out-of-tree scheduler plugin and configuring the default NodeResourcesFit scheduler plugin to pack in the GPU dimension.
helm install scheduler-plugins --namespace scheduler-plugins --create-namespace \
scheduler-plugins/manifests/install/charts/as-a-second-scheduler/ \
--set-json pluginConfig='[{"args":{"scoringStrategy":{"resources":[{"name":"nvidia.com/gpu","weight":1}],"requestedToCapacityRatio":{"shape":[{"utilization":0,"score":0},{"utilization":100,"score":10}]},"type":"RequestedToCapacityRatio"}},"name":"NodeResourcesFit"},{"args":{"permitWaitingTimeSeconds":300},"name":"Coscheduling"}]'
Patch scheduler-plugins pod priorities:
oc patch deployment -n scheduler-plugins --type=json --patch-file setup.RHOAI-v2.18/scheduler-priority-patch.yaml scheduler-plugins-controller
oc patch deployment -n scheduler-plugins --type=json --patch-file setup.RHOAI-v2.18/scheduler-priority-patch.yaml scheduler-plugins-scheduler
Create the Red Hat OpenShift AI subscription:
oc apply -f setup.RHOAI-v2.18/mlbatch-subscription.yaml
Identify install plan:
oc get ip -n redhat-ods-operator
NAMESPACE NAME CSV APPROVAL APPROVED
redhat-ods-operator install-kmh8w rhods-operator.2.16.0 Manual false
Approve install plan replacing the generated plan name below with the actual value:
oc patch ip -n redhat-ods-operator --type merge --patch '{"spec":{"approved":true}}' install-kmh8w
Create DSC Initialization:
oc apply -f setup.RHOAI-v2.18/mlbatch-dsci.yaml
Create Data Science Cluster:
oc apply -f setup.RHOAI-v2.18/mlbatch-dsc.yaml
The provided DSCI and DSC are intended to install a minimal set of Red Hat OpenShift
AI managed components: codeflare
, kueue
, ray
, and trainingoperator
. The
remaining components such as dashboard
can be optionally enabled.
The configuration of the managed components differs from the default Red Hat OpenShift AI configuration as follows:
- Kubeflow Training Operator:
gang-scheduler-name
is set toscheduler-plugins-scheduler
,
- Kueue:
manageJobsWithoutQueueName
is enabled,batch/job
integration is disabled,waitForPodsReady
is disabled,LendingLimit
feature gate is enabled,fairSharing
is enabled,enableClusterQueueResources
metrics is enabled,
- Codeflare operator:
- the AppWrapper controller is enabled and configured as follows:
userRBACAdmissionCheck
is disabled,schedulerName
is set toscheduler-plugins-scheduler
,queueName
is set todefault-queue
,slackQueueName
is set toslack-cluster-queue
- the AppWrapper controller is enabled and configured as follows:
- pod priorities, resource requests and limits have been adjusted.
Helm charts values and how-to for customization can be found in the official documentation. As-is, Autopilot will run on GPU nodes.
- Add the Autopilot Helm repository
helm repo add autopilot https://ibm.github.io/autopilot/
helm repo update
- Install the chart (idempotent command). The config file is for customizing the helm values and it is optional.
helm upgrade autopilot autopilot/autopilot --install --namespace=autopilot --create-namespace -f your-config.yml
After completing the installation, manually label the namespace to enable metrics to be scraped by Prometheus with the following command:
oc label ns autopilot openshift.io/cluster-monitoring=true
The ServiceMonitor
labeling is not required.
Create Kueue's default flavor:
oc apply -f setup.RHOAI-v2.18/default-flavor.yaml
Create mlbatch-edit
role:
oc apply -f setup.RHOAI-v2.18/mlbatch-edit-role.yaml
Create the designated slack ClusterQueue
which will be used to automate
minor adjustments to cluster capacity caused by node failures and
scheduler maintanence.
oc apply -f- << EOF
apiVersion: kueue.x-k8s.io/v1beta1
kind: ClusterQueue
metadata:
name: slack-cluster-queue
spec:
namespaceSelector: {}
cohort: default-cohort
preemption:
withinClusterQueue: LowerOrNewerEqualPriority
reclaimWithinCohort: Any
borrowWithinCohort:
policy: Never
resourceGroups:
- coveredResources: ["cpu", "memory", "nvidia.com/gpu", "nvidia.com/roce_gdr", "pods"]
flavors:
- name: default-flavor
resources:
- name: "cpu"
nominalQuota: 8000m
- name: "memory"
nominalQuota: 128Gi
- name: "nvidia.com/gpu"
nominalQuota: 8
- name: "nvidia.com/roce_gdr"
nominalQuota: 1
- name: "pods"
nominalQuota: 100
EOF
Edit the above quantities to adjust the quota to the desired
values. Pod counts are optional and can be omitted from the list of
covered resources. The lendingLimit
for each resource will be
dynamically adjusted by the MLBatch system to reflect reduced cluster
capacity. See QUOTA_MAINTENANCE.md for a
detailed discussion of the role of the slack ClusterQueue
.