Skip to content

Conversation

petarjuki7
Copy link
Member

Issue Addressed

Adresses issue #519

Proposed Changes

We keep track of all OperatorAdded events in a new variable stored in the database (max_operator_id_seen).
Then, when we get a new OperatorAdded event, we can check does it match the expected OperatorId we should have seen next.

Also, during historical sync we check if we saw any OperatorAdded event, if not, there is likely a sync error.

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Adds tracking and validation for OperatorAdded events to detect synchronization issues by monitoring sequence gaps and missing events during historical sync.

  • Introduces max_operator_id_seen field to track the highest operator ID encountered
  • Validates that OperatorAdded events arrive in sequential order during live sync
  • Warns if no OperatorAdded events are found during historical synchronization

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
anchor/eth/src/event_processor.rs Adds validation logic for OperatorAdded event sequence and missing events detection
anchor/database/src/table_schema.sql Adds max_operator_id_seen column to metadata table
anchor/database/src/state.rs Implements getter for max_operator_id_seen from database and state
anchor/database/src/sql_operations.rs Defines SQL queries for retrieving and updating max_operator_id_seen
anchor/database/src/lib.rs Adds max_operator_id_seen field to SingleState and bump operation

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copy link
Member

@dknopik dknopik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Here are my thoughts.

If we decide on not persisting the max operator id seen, I think the implementation might be straight forward enough to warrant inclusion in v1.0.0. wdyt?

domain_type INTEGER NOT NULL,
block_number INTEGER NOT NULL DEFAULT 0 CHECK (block_number >= 0)
block_number INTEGER NOT NULL DEFAULT 0 CHECK (block_number >= 0),
max_operator_id_seen INTEGER NOT NULL DEFAULT 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For simplicity, I am not sure if we should persist this. It might be sufficient to keep this as an Option<OperatorId> during sync, initialized on the first seen operatorAdded, and checked subsequently during this anchor instances lifetime

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about persistence also... I think the problem may be with the historical sync because, if I understood the code correctly, after we do it for the first time we persist the block data, and on the next startup of the anchor instance we only sync from the last_processed_block, but our operator_seen count would start from 0 which could cause inconsistencies.

Copy link
Member

@dknopik dknopik Sep 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is correct. This is why I suggest using an last_operator_add_seen: Option<OperatorId>, initialized to None. To avoid the inconsistency you mentioned, we would not do the check on the first operatorAdded we see, but use it to initialize the last_operator_add_seen. Then we can do the check when receiving more operatorAdded.

The downside is that we do not catch missing operatorsAdded between the last run and the first operatorAdded we get in the current run. I am on the fence whether that is alright.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah okay, I understand.
The downside might be fine, because it if it's truly not catching the events, we will find out soon enough?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we would not do the check on the first operatorAdded we see, but use it to initialize the last_operator_add_see

That's the crux of the issue, isn't it? A pruned client could have deleted the state containing some initial operators

@diegomrsantos
Copy link
Member

The current approach might not be enough. Does the operator id start at zero or one and then continue increasing by 1? If so, we need to guarantee that we have seen all of them up to the current value in the contract.

@petarjuki7
Copy link
Member Author

we need to guarantee that we have seen all of them up to the current value in the contract.

I think that was the downside discussed about this approach when we don't persist the max_operator_id_seen variable. We kind of "trust" the first OperatorAdded event we see will initialize the variable with the highest OperatorId.

I think the purpose of this PR is more to warn users that something with syncing is wrong so they configure their EL client correctly.

But I could implement a check that goes through our saved OperatorIds in the database and checks if they are monotonically increasing by 1 up until the max_seen?

@diegomrsantos
Copy link
Member

What I described is a suggestion about how to check if Anchor has synced correctly starting with an empty DB. Additionally, we also need to persist the highest operator id we have seen to be able to do the check when we start syncing after being offline.

@diegomrsantos
Copy link
Member

I think we do have to persist the highest_operator_id_seen. It will be created initially with the first value used by the contract.

But I could implement a check that goes through our saved OperatorIds in the database and checks if they are monotonically increasing by 1

This needs to be done during sync, starting with the current value in the DB.

@diegomrsantos
Copy link
Member

And before we start syncing, we could read the last operator id from the ssv contract and check that see all ids up to this value. But not sure it's necessary

@petarjuki7
Copy link
Member Author

Yeah, I understand the idea, can do it.
Just tagging @dknopik to confirm, if I remember correctly we decided on the non-persistence path yesterday.

@dknopik
Copy link
Member

dknopik commented Sep 24, 2025

@diegomrsantos In yesterdays meeting we discussed that we can also not persist the highest_operator_id_seen and initialize it with the first operator ID seen per sync. This would be sufficient to catch any missing operatorAdded for the lifetime of that instance.

In the meeting, I asked if there is agreement to do it this way for now, as it is a tradeoff for not having to modify the database that closely to the release. While it is of course okay to change one's mind, I believe the simpler option is better for a short term change

@dknopik
Copy link
Member

dknopik commented Sep 24, 2025

@petarjuki7 please rebase onto release-v1.0.0

@petarjuki7 petarjuki7 changed the base branch from unstable to release-v1.0.0 September 24, 2025 11:58
@petarjuki7 petarjuki7 requested a review from dknopik September 24, 2025 12:44
but got operator {operator_id}."
);
}
self.db.bump_max_operator_id_seen();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be a bit cleaner to just have set_max_operator_id_seen

Comment on lines 515 to 521
.event_processor
.db
.state()
.get_max_operator_id_seen()
.is_none()
{
warn!("No OperatorAdded events found in historical sync, there is likely a sync error");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we do not persist get_max_operator_id_seen, we need to check the operators map instead here, as very short historic syncs (e.g. after a restart) trigger this otherwise.

@dknopik dknopik changed the base branch from release-v1.0.0 to unstable October 20, 2025 13:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants