Skip to content

No exception handle for cluster 'already exists' exception in dask job creation #940

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
guozhans opened this issue Apr 16, 2025 · 0 comments · Fixed by #941
Closed

No exception handle for cluster 'already exists' exception in dask job creation #940

guozhans opened this issue Apr 16, 2025 · 0 comments · Fixed by #941
Labels

Comments

@guozhans
Copy link

guozhans commented Apr 16, 2025

Describe the issue:
Hi
We use Flyte task to deploy DaskJob, and we encountered an issue that the runner sometimes was missing while the cluster was reported an error 'already exists'

it looks like the await call got an exception 'already exits' at this line

Since the dask cluster already exists, why don't we handle the exception 'already exists' for creating Dask cluster, and then continue to create the runner?

Error message:

Handler 'daskjob_create_components/status.jobStatus' failed with an exception. Will retry. Traceback (most recent call last): File "/usr/local/lib/python3.10/site-packages/kr8s/_api.py", line 168, in call_api response.raise_for_status() File "/usr/local/lib/python3.10/site-packages/httpx/_models.py", line 763, in raise_for_status raise HTTPStatusError(message, request=request, response=self) httpx.HTTPStatusError: Client error '409 Conflict' for url '[https://10.0.0.1/apis/kubernetes.dask.org/v1..."](https://10.0.0.1/apis/kubernetes.dask.org/v1...%22), line 774, in daskjob_create_components await cluster.create() File "/usr/local/lib/python3.10/site-packages/kr8s/_objects.py", line 320, in create async with self.api.call_api( File "/usr/local/lib/python3.10/contextlib.py", line 199, in __aenter__ return await anext(self.gen) File "/usr/local/lib/python3.10/site-packages/kr8s/_api.py", line 186, in call_api raise ServerError( kr8s._exceptions.ServerError: daskclusters.kubernetes.dask.org "fn2oaqa4432x5o-n0-0-dn7-0" already exists

Environment:

  • Dask version: 2024.10.0
  • Python version: 3.12
  • Operating System:
  • Install method (conda, pip, source):
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants