-
Notifications
You must be signed in to change notification settings - Fork 25
fix: warn if OperatorAdded events not in sync #625
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: unstable
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Adds tracking and validation for OperatorAdded events to detect synchronization issues by monitoring sequence gaps and missing events during historical sync.
- Introduces
max_operator_id_seen
field to track the highest operator ID encountered - Validates that OperatorAdded events arrive in sequential order during live sync
- Warns if no OperatorAdded events are found during historical synchronization
Reviewed Changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
File | Description |
---|---|
anchor/eth/src/event_processor.rs | Adds validation logic for OperatorAdded event sequence and missing events detection |
anchor/database/src/table_schema.sql | Adds max_operator_id_seen column to metadata table |
anchor/database/src/state.rs | Implements getter for max_operator_id_seen from database and state |
anchor/database/src/sql_operations.rs | Defines SQL queries for retrieving and updating max_operator_id_seen |
anchor/database/src/lib.rs | Adds max_operator_id_seen field to SingleState and bump operation |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Here are my thoughts.
If we decide on not persisting the max operator id seen, I think the implementation might be straight forward enough to warrant inclusion in v1.0.0
. wdyt?
anchor/database/src/table_schema.sql
Outdated
domain_type INTEGER NOT NULL, | ||
block_number INTEGER NOT NULL DEFAULT 0 CHECK (block_number >= 0) | ||
block_number INTEGER NOT NULL DEFAULT 0 CHECK (block_number >= 0), | ||
max_operator_id_seen INTEGER NOT NULL DEFAULT 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For simplicity, I am not sure if we should persist this. It might be sufficient to keep this as an Option<OperatorId>
during sync, initialized on the first seen operatorAdded
, and checked subsequently during this anchor
instances lifetime
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about persistence also... I think the problem may be with the historical sync because, if I understood the code correctly, after we do it for the first time we persist the block data, and on the next startup of the anchor
instance we only sync from the last_processed_block
, but our operator_seen
count would start from 0 which could cause inconsistencies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is correct. This is why I suggest using an last_operator_add_seen: Option<OperatorId>
, initialized to None
. To avoid the inconsistency you mentioned, we would not do the check on the first operatorAdded
we see, but use it to initialize the last_operator_add_seen
. Then we can do the check when receiving more operatorAdded
.
The downside is that we do not catch missing operatorsAdded
between the last run and the first operatorAdded
we get in the current run. I am on the fence whether that is alright.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah okay, I understand.
The downside might be fine, because it if it's truly not catching the events, we will find out soon enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we would not do the check on the first operatorAdded we see, but use it to initialize the last_operator_add_see
That's the crux of the issue, isn't it? A pruned client could have deleted the state containing some initial operators
The current approach might not be enough. Does the operator id start at zero or one and then continue increasing by 1? If so, we need to guarantee that we have seen all of them up to the current value in the contract. |
I think that was the downside discussed about this approach when we don't persist the I think the purpose of this PR is more to warn users that something with syncing is wrong so they configure their EL client correctly. But I could implement a check that goes through our saved |
What I described is a suggestion about how to check if Anchor has synced correctly starting with an empty DB. Additionally, we also need to persist the highest operator id we have seen to be able to do the check when we start syncing after being offline. |
I think we do have to persist the
This needs to be done during sync, starting with the current value in the DB. |
And before we start syncing, we could read the last operator id from the ssv contract and check that see all ids up to this value. But not sure it's necessary |
Yeah, I understand the idea, can do it. |
@diegomrsantos In yesterdays meeting we discussed that we can also not persist the In the meeting, I asked if there is agreement to do it this way for now, as it is a tradeoff for not having to modify the database that closely to the release. While it is of course okay to change one's mind, I believe the simpler option is better for a short term change |
@petarjuki7 please rebase onto |
f52b3f6
to
8764b86
Compare
anchor/eth/src/event_processor.rs
Outdated
but got operator {operator_id}." | ||
); | ||
} | ||
self.db.bump_max_operator_id_seen(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be a bit cleaner to just have set_max_operator_id_seen
anchor/eth/src/sync.rs
Outdated
.event_processor | ||
.db | ||
.state() | ||
.get_max_operator_id_seen() | ||
.is_none() | ||
{ | ||
warn!("No OperatorAdded events found in historical sync, there is likely a sync error"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we do not persist get_max_operator_id_seen
, we need to check the operators
map instead here, as very short historic syncs (e.g. after a restart) trigger this otherwise.
Issue Addressed
Adresses issue #519
Proposed Changes
We keep track of all
OperatorAdded
events in a new variable stored in the database (max_operator_id_seen
).Then, when we get a new
OperatorAdded
event, we can check does it match the expectedOperatorId
we should have seen next.Also, during historical sync we check if we saw any
OperatorAdded
event, if not, there is likely a sync error.