-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
job.status and job.log returns error for certain domain name pattern #150
Comments
@KPostOffice @jbusche @MichaelClifford Our _parse_app_id looks different from the one in the upstream. Is it a fork which we can make it work for now ? |
The one upstream was more broken than the one in our custom version. We can definitely make changes, I'm not sure how to make it more generic exactly. Maybe something including an reverse parse for |
@KPostOffice Reverse parsing sounds like a great idea!. Where is the custom torchx repo located ? I will see if can create a PR there. |
Thanks for pointing this out @tedhtchang. I recall we ran into this issue before, and we could not determine a regex that provide stable results, so we added a list of reasonable suffixes. Likely worth revisiting to see if we can implement a general solution. In the mean time a quick fix would be just to add |
The job.status() and job.logs() Are failing, they're showing
Cause:
In torchx The app_handle is a string. On my cluster it looks like
ray://torchx/ray-dashboard-mnisttest-default.tedchang-codeflare-test-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.us-south.containers.appdomain.cloud-mnisttest-trwmgjb2s041jc
This method uses a regx that not parse the app_id(mnisttest-trwmgjb2s041jc) and ray dashboard(ray-dashboard-mnisttest-default.tedchang-codeflare-test-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.us-south.containers.appdomain.cloud) property
":\d+-|.com-|.org-|.net-|.gov-|.io-"
/opt/app-root/lib64/python3.8/site-packages/torchx/schedulers/ray_scheduler.py
something like this worked for my cluster but may not be robust
":\d+-|.com-|.org-|.net-|.gov-|.io-|.cloud-"
The text was updated successfully, but these errors were encountered: