-
Notifications
You must be signed in to change notification settings - Fork 310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Eager cleanup #3148
base: master
Are you sure you want to change the base?
Eager cleanup #3148
Conversation
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Code Review Agent Run Status
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #3148 +/- ##
===========================================
- Coverage 76.85% 41.23% -35.63%
===========================================
Files 206 211 +5
Lines 21851 22060 +209
Branches 2837 2868 +31
===========================================
- Hits 16794 9096 -7698
- Misses 4269 12815 +8546
+ Partials 788 149 -639 ☔ View full report in Codecov by Sentry. |
Code Review Agent Run Status
|
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Code Review Agent Run Status
|
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Code Review Agent Run Status
|
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Code Review Agent Run #ce446dActionable Suggestions - 3
Review Details
|
Changelist by BitoThis pull request implements the following key changes.
|
if inspect.iscoroutinefunction(entity.__call__): | ||
outputs = run_sync(entity, **kwargs) | ||
else: | ||
outputs = entity(**kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider handling potential exceptions from run_sync
when executing coroutine functions. The current implementation may silently fail if the async execution encounters issues.
Code suggestion
Check the AI-generated fix before applying
if inspect.iscoroutinefunction(entity.__call__): | |
outputs = run_sync(entity, **kwargs) | |
else: | |
outputs = entity(**kwargs) | |
if inspect.iscoroutinefunction(entity.__call__): | |
try: | |
outputs = run_sync(entity, **kwargs) | |
except Exception as e: | |
raise RuntimeError(f"Async execution failed for {entity.name}: {str(e)}") from e | |
else: | |
outputs = entity(**kwargs) |
Code Review Run #ce446d
Should Bito avoid suggestions like this for future reviews? (Manage Rules)
- Yes, avoid them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a few nits.
# todo: remove this before merging | ||
# this is actually bad, but useful for developing | ||
cleanup._container_image = self._container_image |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Code Review Agent Run #e76f06Actionable Suggestions - 0Review Details
|
Why are the changes needed?
Eager tasks operate outside of the fundamental assumptions made when Flyte was designed. This leads to potentially orphaned executions - executions that were launched by an eager task that aren't terminated when a parent eager task fails.
What changes were proposed in this pull request?
This PR adds an imperative workflow for each eager task, similar to what Admin does for single task executions. However it also attaches a failure node handler in cases where an eager task fails.
EagerFailureHandlerTask
that's similar to theEchoTask
. It is used to mimic the input interface of an eager task since the inputs to the failure node have to match that of the parent workflow/eager task. This failure task leverages the execution tags functionality as well as the tags that eager tasks add to child executions to make sure they're all cleaned up.EagerFailureHandlerTask
instance with no input interface at runtime. Since we don't need to transform the inputs at failure handling runtime, we don't need to store a path to the original eager task to retrieve the Python types.ValueNotIn
filter object was missing but ended up not using it.Todo: Before merging, two changes need to be reverted: requesting of a default secret, and setting of the failure task container image.
How was this patch tested?
Setup process
This was tested on Union internal clusters. (Testing on Flyte clusters is not yet possible due to missing single-task-execution labels.) Wrote an eager task that created some downstream executions that take varying amounts of time. One of the downstream tasks is set to fail, triggering a cleanup of the remaining ones.
Screenshots
Failure node tasks do not run if everything succeeds.

Check all the applicable boxes
Related PRs
Docs link
Summary by Bito
Implementation of eager task cleanup mechanism with EagerFailureHandlerTask, introducing configurable delay settings and failure handler system. The changes include updating failure node IDs from 'nfail' to 'efn', adding task resolver and failure node support in imperative workflows. These modifications ensure proper cleanup of downstream executions and orphaned tasks when parent eager tasks fail.Unit tests added: True
Estimated effort to review (1-5, lower is better): 3