You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Consider a network where syncing speeds are very low. A node starts with a head that is closer than 8 epochs to the clock but older than the peers current finalized checkpoint. If the node is assigned to propose before it completes syncing, it will publish a block that builds on top of a block that's older than finality. All nodes in the network will reject it as it attempts to revert finality, heavily penalize the node and ban it. The node will then get isolated and never complete sync, continuing to build on it's fork.
The reason to allow the node to propose in this situation is to prevent against an attack:
malicious peers connect to a majority of honest nodes
malicious peers claim to know and advanced finalized checkpoint
honest nodes switch to syncing state and stop proposing
chain loses liveness
Think if we can find a better heuristic that balances both concerns. Or add a flag to set the sync tolerance to 0 in smaller networks. In smaller networks you have a combination of no attacks + low syncing speed (because of PeerDAS) + frequent proposals so this bug manifests more frequently.
The text was updated successfully, but these errors were encountered:
As you mentioned, this is unlikely scenario on a live network, as the node 1/ has to be performing finalize sync AND is within 8 epoch from the head; 2/ the node is a proposing a block during this small window; 3/ and the peer set is small
The proposer node would get banned by it's mesh peers immediately after rejecting the block and blobs/columns
It would still be possible to sync from the remaining peer on a live network (~100 peers)
The other effect is that this node would continue to perform finalized sync from peers, and it would start from it's new head slot (due to the "optimistic start" optimisation), which would fail because the peer will respond with blocks that are not descendant from the same chain - it may make it very difficult for this node to sync to the right chain.
I think the restriction in the HTTP API is reasonable. It looks like we can force a range sync with well crafted status messages. These should be very short lived tho.
One thing I was thinking which should help in the majority of cases, is to have something like, if finalized_slot > slot_clock - 1 epoch, then we out-right reject it. Because unless our clock is out, no one can finalize within 1 epoch of the slot_clock head.
Then, because we have a 1 epoch slot tolerance in sync, if we are sync'd no one can force us into a range sync.
If no one can force us into a range sync once we're sync'd, then I think it might be fine to relax the HTTP condition down to 1 or 2 epochs (from 8).
Although I'm aware this logic explodes in periods of non-finality.
Consider a network where syncing speeds are very low. A node starts with a head that is closer than 8 epochs to the clock but older than the peers current finalized checkpoint. If the node is assigned to propose before it completes syncing, it will publish a block that builds on top of a block that's older than finality. All nodes in the network will reject it as it attempts to revert finality, heavily penalize the node and ban it. The node will then get isolated and never complete sync, continuing to build on it's fork.
The reason to allow the node to propose in this situation is to prevent against an attack:
Think if we can find a better heuristic that balances both concerns. Or add a flag to set the sync tolerance to 0 in smaller networks. In smaller networks you have a combination of no attacks + low syncing speed (because of PeerDAS) + frequent proposals so this bug manifests more frequently.
The text was updated successfully, but these errors were encountered: