-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Worker dies because timeout is not respected #13060
Comments
Thanks for submitting an issue @MattDelac! Do you have an example setup that we can use to reproduce this issue? In particular, sharing how your work pool is configured and the command that you use to start your worker would be helpful. |
The worker does not seem to die in that traceback — it just logs the error. Can you include more logs indicating that the worker is dead? |
As I posted here #7442 (comment), the worker is "waiting from Cloud" but Cloud says that the worker in unhealthy 🤷 |
And you're right, the worker might not die per say but Cloud thinks it became unhealthy for reasons I cannot figure out |
Gotcha, looks like you're using an agent with a CloudRun infrastructure block. Could you share your CloudRun block configuration? Also, can your agent continue picking up flow runs after this error, or does it stop picking up flow runs? |
@desertaxle Is there a way to share a JSON config or something nicer and more verbose? |
Ok @desertaxle, the problem is not that the agent dies but here is the behavior I have
So yeah, the real fix here is to ensure that the timeout is respected and maybe to have Prefect Cloud checks if the jobs run once an hour, for example. It might help Prefect Cloud cleans up its internal state of the "running jobs" |
Given that this issue is related to agents, which have been deprecated and removed, I'm going to close as "not planned"; if this problem persists with Cloud Run Workers, please open a new issue and we will look into it! |
Expectation / Proposal
Original conversation
The worker dies because some tasks run longer than the timeout set up.
Traceback / Example
This is a separate issue, please open a question in the prefect-gcp repository if you want to discuss that further. It looks like your flow is running longer than the default timeout. See that piece of code.
The text was updated successfully, but these errors were encountered: