You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As part of my automated Codeflare testing, I'm hitting this exception:
ERROR:root:Caught exception HTTPError: 503 Server Error: Service Unavailable for url: http://ray-dashboard-mnisttest-user0-codeflare-sdk-user-test-user-0.apps.kpouget-sutest-20230726-07h01.psap.aws.rhperfscale.org/api/version
Traceback (most recent call last):
File "/opt/ci-artifacts/src/testing/codeflare/test.py", line 180, in <module>
sys.exit(main())
File "/opt/ci-artifacts/src/testing/codeflare/test.py", line 175, in main
fire.Fire(Entrypoint())
File "/opt/venv/lib/python3.9/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/opt/venv/lib/python3.9/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/opt/venv/lib/python3.9/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/opt/ci-artifacts/src/testing/codeflare/test.py", line 49, in wrapper
fct(*args, **kwargs)
File "/opt/ci-artifacts/src/testing/codeflare/test.py", line 148, in sdk_user_run_one
test_sdk_user.run_one()
File "/opt/ci-artifacts/src/testing/codeflare/test_sdk_user.py", line 165, in run_one
timeout(entrypoint.main,
File "/opt/ci-artifacts/src/testing/codeflare/test_sdk_user.py", line 148, in timeout
return func(*args, **kwargs)
File "/mnt/logs/002__run_one/sample.py", line 34, in main
job = job_def.submit(cluster)
File "/opt/venv/lib/python3.9/site-packages/codeflare_sdk/job/jobs.py", line 166, in submit
return DDPJob(self, cluster)
File "/opt/venv/lib/python3.9/site-packages/codeflare_sdk/job/jobs.py", line 174, in __init__
self._app_handle = torchx_runner.schedule(job_definition._dry_run(cluster))
File "/opt/venv/lib/python3.9/site-packages/torchx/runner/api.py", line 278, in schedule
app_id = sched.schedule(dryrun_info)
File "/opt/venv/lib/python3.9/site-packages/torchx/schedulers/ray_scheduler.py", line 199, in schedule
client: JobSubmissionClient = JobSubmissionClient(
File "/opt/venv/lib/python3.9/site-packages/ray/dashboard/modules/job/sdk.py", line 100, in __init__
self._check_connection_and_version(
File "/opt/venv/lib/python3.9/site-packages/ray/dashboard/modules/dashboard_sdk.py", line 228, in _check_connection_and_version
self._check_connection_and_version_with_url(min_version, version_error_message)
File "/opt/venv/lib/python3.9/site-packages/ray/dashboard/modules/dashboard_sdk.py", line 245, in _check_connection_and_version_with_url
r.raise_for_status()
File "/opt/venv/lib/python3.9/site-packages/requests/models.py", line 1021, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 503 Server Error: Service Unavailable for url: http://ray-dashboard-mnisttest-user0-codeflare-sdk-user-test-user-0.apps.kpouget-sutest-20230726-07h01.psap.aws.rhperfscale.org/api/version
Yeah, that image should still work, though it's worth noting that in the upcoming release the default will be switching to the new 2.5.0 image linked there
As part of my automated Codeflare testing, I'm hitting this exception:
This python file is being executed:
and the last line raises the exception.
--quiet
flag to capture the exact version being installedThe text was updated successfully, but these errors were encountered: