Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Image pull policy #2101

Merged
merged 7 commits into from
Apr 29, 2024

Conversation

blublinsky
Copy link
Contributor

Why are these changes needed?

Adds support for the image pull policy for Ray cluster in the API serve

Related issue number

Closes #2047

Checks

  • [ x] I've made sure the tests are passing.
  • Testing Strategy
    • [ x] Unit tests
    • [ x] Manual tests
    • This PR is not tested :(

@blublinsky
Copy link
Contributor Author

@tedhtchang, @z103cb please take a look

cc: @kevin85421

@blublinsky
Copy link
Contributor Author

Test RayCluster Sample YAMLs (latest release) does not seem to apply to the code changes in this PR

Copy link
Contributor

@z103cb z103cb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good, but I think there's an essential improvement that can be made:

  • a validation to the value of imagePullPolicy field in both the WorkerGroupSpec and HeadGroupSpec.
// ValidateClusterSpec validates that the *api.ClusterSpec is not nil and
// has all the required fields
func ValidateClusterSpec(clusterSpec *api.ClusterSpec) error {

The allowed values should be:

const (
	// PullAlways means that kubelet always attempts to pull the latest image. Container will fail If the pull fails.
	PullAlways PullPolicy = "Always"
	// PullNever means that kubelet never pulls an image, but only uses a local image. Container will fail if the image isn't present
	PullNever PullPolicy = "Never"
	// PullIfNotPresent means that kubelet pulls if the image isn't present on disk. Container will fail if the image isn't present and the pull fails.
	PullIfNotPresent PullPolicy = "IfNotPresent"
)

@z103cb
Copy link
Contributor

z103cb commented Apr 29, 2024

@blublinsky,
I have tested your PR, using the nightly operator image and your apiserver build.

Using the cluster spec below, I have encountered a rather annoying issue with the Custer. While specifying a not supported image pull policy, the cluster did come up using the default "IfNotPresent".

I think that there should be a validation at the api level that would disallow these inputs, to prevent user experience surprises.

curl -X DELETE 'localhost:8888/apis/v1/namespaces/default/clusters' \
--header 'Content-Type: application/json' \
--data '{
  "name": "test-cluster",
  "namespace": "default",
  "user": "boris",
  "clusterSpec": {
    "headGroupSpec": {
      "computeTemplate": "default-template",
      "image": "rayproject/ray:2.9.0-py310",
      "serviceType": "NodePort",
      "rayStartParams": {
         "dashboard-host": "0.0.0.0",
         "metrics-export-port": "8080"
       },
       "volumes": [
         {
           "name": "code-sample",
           "mountPath": "/home/ray/samples",
           "volumeType": "CONFIGMAP",
           "source": "ray-job-code-sample",
           "items": {"sample_code.py" : "sample_code.py"}
         }
       ]
    },
    "workerGroupSpec": [
      {
        "groupName": "small-wg",
        "computeTemplate": "default-template",
        "image": "rayproject/ray:2.9.0-py310",
        "replicas": 1,
        "minReplicas": 0,
        "maxReplicas": 5,
        "imagePullPolicy": "Bogus",
        "rayStartParams": {
           "node-ip-address": "$MY_POD_IP"
         },
        "volumes": [
          {
            "name": "code-sample",
            "mountPath": "/home/ray/samples",
            "volumeType": "CONFIGMAP",
            "source": "ray-job-code-sample",
            "items": {"sample_code.py" : "sample_code.py"}
          }
        ]
      }
    ]
  }
}'

@blublinsky
Copy link
Contributor Author

@z103cb - validations added

Copy link
Contributor

@z103cb z103cb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is: LGTM.

@blublinsky
Copy link
Contributor Author

@kevin85421, can you please, take a look and approve

Copy link
Member

@kevin85421 kevin85421 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't reviewed this PR. This PR only changes KubeRay API server.

@kevin85421 kevin85421 merged commit f27e4ac into ray-project:master Apr 29, 2024
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature] Support ImagePullPolicy in Worker and Head NodeSpec
3 participants