Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cluster.wait_ready() fails with 'MissingModel' object is not callable #250

Open
kpouget opened this issue Jul 26, 2023 · 1 comment
Open

Comments

@kpouget
Copy link

kpouget commented Jul 26, 2023

As part of my automated Codeflare testing, I'm hitting this exception:

Traceback (most recent call last):
  File "/opt/ci-artifacts/src/testing/codeflare/test.py", line 180, in <module>
    sys.exit(main())
  File "/opt/ci-artifacts/src/testing/codeflare/test.py", line 175, in main
    fire.Fire(Entrypoint())
  File "/opt/venv/lib/python3.9/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/opt/venv/lib/python3.9/site-packages/fire/core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/opt/venv/lib/python3.9/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/opt/ci-artifacts/src/testing/codeflare/test.py", line 49, in wrapper
    fct(*args, **kwargs)
  File "/opt/ci-artifacts/src/testing/codeflare/test.py", line 148, in sdk_user_run_one
    test_sdk_user.run_one()
  File "/opt/ci-artifacts/src/testing/codeflare/test_sdk_user.py", line 165, in run_one
    timeout(entrypoint.main,
  File "/opt/ci-artifacts/src/testing/codeflare/test_sdk_user.py", line 148, in timeout
    return func(*args, **kwargs)
  File "/mnt/logs/002__run_one/sample.py", line 28, in main
    cluster.wait_ready()
  File "/opt/venv/lib/python3.9/site-packages/codeflare_sdk/cluster/cluster.py", line 221, in wait_ready
    status, ready = self.status(print_to_console=False)
  File "/opt/venv/lib/python3.9/site-packages/codeflare_sdk/cluster/cluster.py", line 160, in status
    appwrapper = _app_wrapper_status(self.config.name, self.config.namespace)
  File "/opt/venv/lib/python3.9/site-packages/codeflare_sdk/cluster/cluster.py", line 345, in _app_wrapper_status
    return _map_to_app_wrapper(cluster)
  File "/opt/venv/lib/python3.9/site-packages/codeflare_sdk/cluster/cluster.py", line 469, in _map_to_app_wrapper
    status=AppWrapperStatus(cluster_model.status.state.lower()),
TypeError: 'MissingModel' object is not callable

This python file is being executed:

    # Create our cluster and submit appwrapper
    cluster = Cluster(ClusterConfiguration(
        namespace=namespace, name=f"mnisttest-user{user_idx}",
        min_worker=2, max_worker=2,
        min_cpus=2, max_cpus=2,
        min_memory=4, max_memory=4,
        gpu=0,
        instascale=False))
    # Bring up the cluster
    cluster.up()
    cluster.wait_ready() # <-- this line raises the exception
    cluster.status()
    cluster.details()

    job_def = DDPJobDefinition(name="mnisttest", script="mnist.py", workspace=".", scheduler_args={"requirements": "./requirements.txt"})
    job = job_def.submit(cluster)

The RayCluster Pods are pending because of project-codeflare/multi-cluster-app-dispatcher#512, but codeflare-sdk shouldn't fail because of it:

codeflare-sdk-user-test-user-1                     mnisttest-user1-head-v7fn8                                            0/1     Pending     0               6m43s   <none>         <none>                                       <none>           <none>
codeflare-sdk-user-test-user-1                     nisttest-user1-worker-small-group-mnisttest-user1-dwhb4               0/1     Pending     0               6m43s   <none>         <none>                                       <none>           <none>
codeflare-sdk-user-test-user-1                     nisttest-user1-worker-small-group-mnisttest-user1-xccpd               0/1     Pending     0               6m43s   <none>         <none>                                       <none>           <none>

Here is the state of the AppWrapper (captured manually after the test):
appwrapper.yaml.log


  • Codeflare SDK is installed from pip (latest version)
    • I'll remove the --quiet flag to capture the exact version being installed
  • Codeflare stack is installed from ODH + OpenShift Codeflare operator
@kpouget
Copy link
Author

kpouget commented Jul 26, 2023

seems to be the same issue as #226

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant