Skip to content

The cluster.wait_ready() command fails when the RayCluster doesn't have status #339

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
roytman opened this issue Sep 2, 2023 · 0 comments · Fixed by #340
Closed

The cluster.wait_ready() command fails when the RayCluster doesn't have status #339

roytman opened this issue Sep 2, 2023 · 0 comments · Fixed by #340

Comments

@roytman
Copy link
Contributor

roytman commented Sep 2, 2023

Describe the Bug

The cluster.wait_ready() checks the status of the AppWrapper and, after that the RayCluster.
Which is done with sub-commands _map_to_app_wrapper and _map_to_ray_cluster.
Both of them check fields of Status, which cannot be defined.
The PR #254 fixed the AppWrapper check, but it did not fix the RayCluster check.

  File "/usr/local/lib/python3.10/site-packages/codeflare_sdk/cluster/cluster.py", line 271, in wait_ready
    status, ready = self.status(print_to_console=False)
  File "/usr/local/lib/python3.10/site-packages/codeflare_sdk/cluster/cluster.py", line 237, in status
    cluster = _ray_cluster_status(self.config.name, self.config.namespace)
  File "/usr/local/lib/python3.10/site-packages/codeflare_sdk/cluster/cluster.py", line 521, in _ray_cluster_status
    return _map_to_ray_cluster(rc)
  File "/usr/local/lib/python3.10/site-packages/codeflare_sdk/cluster/cluster.py", line 572, in _map_to_ray_cluster
    if "state" in rc["status"]:
KeyError: 'status'

Codeflare Stack Component Versions

Please specify the component versions in which you have encountered this bug.

Codeflare SDK: v0.7.0
MCAD:
Instascale:
Codeflare Operator:
Other:

Steps to Reproduce the Bug

  1. Create a cluster
  2. call cluster.up()
  3. and immediately call cluster.wait_ready()
  4. See error:

Expected Behavior

The code should safely wait until the RayCluster is ready.

Affected Releases

v0.6.1, v.07.0 and the main branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant