Skip to content

cluster.wait_ready() fails with 'MissingModel' object is not callable #250

Open
@kpouget

Description

@kpouget

As part of my automated Codeflare testing, I'm hitting this exception:

Traceback (most recent call last):
  File "/opt/ci-artifacts/src/testing/codeflare/test.py", line 180, in <module>
    sys.exit(main())
  File "/opt/ci-artifacts/src/testing/codeflare/test.py", line 175, in main
    fire.Fire(Entrypoint())
  File "/opt/venv/lib/python3.9/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/opt/venv/lib/python3.9/site-packages/fire/core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/opt/venv/lib/python3.9/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/opt/ci-artifacts/src/testing/codeflare/test.py", line 49, in wrapper
    fct(*args, **kwargs)
  File "/opt/ci-artifacts/src/testing/codeflare/test.py", line 148, in sdk_user_run_one
    test_sdk_user.run_one()
  File "/opt/ci-artifacts/src/testing/codeflare/test_sdk_user.py", line 165, in run_one
    timeout(entrypoint.main,
  File "/opt/ci-artifacts/src/testing/codeflare/test_sdk_user.py", line 148, in timeout
    return func(*args, **kwargs)
  File "/mnt/logs/002__run_one/sample.py", line 28, in main
    cluster.wait_ready()
  File "/opt/venv/lib/python3.9/site-packages/codeflare_sdk/cluster/cluster.py", line 221, in wait_ready
    status, ready = self.status(print_to_console=False)
  File "/opt/venv/lib/python3.9/site-packages/codeflare_sdk/cluster/cluster.py", line 160, in status
    appwrapper = _app_wrapper_status(self.config.name, self.config.namespace)
  File "/opt/venv/lib/python3.9/site-packages/codeflare_sdk/cluster/cluster.py", line 345, in _app_wrapper_status
    return _map_to_app_wrapper(cluster)
  File "/opt/venv/lib/python3.9/site-packages/codeflare_sdk/cluster/cluster.py", line 469, in _map_to_app_wrapper
    status=AppWrapperStatus(cluster_model.status.state.lower()),
TypeError: 'MissingModel' object is not callable

This python file is being executed:

    # Create our cluster and submit appwrapper
    cluster = Cluster(ClusterConfiguration(
        namespace=namespace, name=f"mnisttest-user{user_idx}",
        min_worker=2, max_worker=2,
        min_cpus=2, max_cpus=2,
        min_memory=4, max_memory=4,
        gpu=0,
        instascale=False))
    # Bring up the cluster
    cluster.up()
    cluster.wait_ready() # <-- this line raises the exception
    cluster.status()
    cluster.details()

    job_def = DDPJobDefinition(name="mnisttest", script="mnist.py", workspace=".", scheduler_args={"requirements": "./requirements.txt"})
    job = job_def.submit(cluster)

The RayCluster Pods are pending because of project-codeflare/multi-cluster-app-dispatcher#512, but codeflare-sdk shouldn't fail because of it:

codeflare-sdk-user-test-user-1                     mnisttest-user1-head-v7fn8                                            0/1     Pending     0               6m43s   <none>         <none>                                       <none>           <none>
codeflare-sdk-user-test-user-1                     nisttest-user1-worker-small-group-mnisttest-user1-dwhb4               0/1     Pending     0               6m43s   <none>         <none>                                       <none>           <none>
codeflare-sdk-user-test-user-1                     nisttest-user1-worker-small-group-mnisttest-user1-xccpd               0/1     Pending     0               6m43s   <none>         <none>                                       <none>           <none>

Here is the state of the AppWrapper (captured manually after the test):
appwrapper.yaml.log


  • Codeflare SDK is installed from pip (latest version)
    • I'll remove the --quiet flag to capture the exact version being installed
  • Codeflare stack is installed from ODH + OpenShift Codeflare operator

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions