【Bug Fix】修复TOCTOU问题 避免心跳正常的执行器出现误删case #3796
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What kind of change does this PR introduce? (check at least one)
The description of the PR:
存在 TOCTOU 窗口:
执行器在 findDead 与 removeDead 之间发送心跳更新update_time ,但结果ID依然会被删掉,导致出现活节点被当成死节点误删
出现“执行器心跳正常却被清掉”的情况
There exists a TOCTOU (time-of-check to time-of-use) race condition in the current implementation. Specifically, an executor may send a heartbeat update (update_time) in the interval between findDead and removeDead. However, the record is still deleted based on the earlier “dead” determination, resulting in an executor that is actually healthy being incorrectly removed.
In practice, this manifests as executors being cleared even though their heartbeats are up to date.
Other information: