fix: warn if OperatorAdded events not in sync #625

petarjuki7 · 2025-09-22T13:15:19Z

Issue Addressed

Adresses issue #519

Proposed Changes

We keep track of all OperatorAdded events in a new variable stored in the database (max_operator_id_seen).
Then, when we get a new OperatorAdded event, we can check does it match the expected OperatorId we should have seen next.

Also, during historical sync we check if we saw any OperatorAdded event, if not, there is likely a sync error.

Copilot

Pull Request Overview

Adds tracking and validation for OperatorAdded events to detect synchronization issues by monitoring sequence gaps and missing events during historical sync.

Introduces max_operator_id_seen field to track the highest operator ID encountered
Validates that OperatorAdded events arrive in sequential order during live sync
Warns if no OperatorAdded events are found during historical synchronization

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
anchor/eth/src/event_processor.rs	Adds validation logic for OperatorAdded event sequence and missing events detection
anchor/database/src/table_schema.sql	Adds max_operator_id_seen column to metadata table
anchor/database/src/state.rs	Implements getter for max_operator_id_seen from database and state
anchor/database/src/sql_operations.rs	Defines SQL queries for retrieving and updating max_operator_id_seen
anchor/database/src/lib.rs	Adds max_operator_id_seen field to SingleState and bump operation

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

anchor/eth/src/event_processor.rs

dknopik

Nice! Here are my thoughts.

If we decide on not persisting the max operator id seen, I think the implementation might be straight forward enough to warrant inclusion in v1.0.0. wdyt?

dknopik · 2025-09-22T13:52:15Z

anchor/database/src/table_schema.sql

    domain_type INTEGER NOT NULL,
-    block_number INTEGER NOT NULL DEFAULT 0 CHECK (block_number >= 0)
+    block_number INTEGER NOT NULL DEFAULT 0 CHECK (block_number >= 0),
+    max_operator_id_seen INTEGER NOT NULL DEFAULT 0


For simplicity, I am not sure if we should persist this. It might be sufficient to keep this as an Option<OperatorId> during sync, initialized on the first seen operatorAdded, and checked subsequently during this anchor instances lifetime

I was thinking about persistence also... I think the problem may be with the historical sync because, if I understood the code correctly, after we do it for the first time we persist the block data, and on the next startup of the anchor instance we only sync from the last_processed_block, but our operator_seen count would start from 0 which could cause inconsistencies.

That is correct. This is why I suggest using an last_operator_add_seen: Option<OperatorId>, initialized to None. To avoid the inconsistency you mentioned, we would not do the check on the first operatorAdded we see, but use it to initialize the last_operator_add_seen. Then we can do the check when receiving more operatorAdded.

The downside is that we do not catch missing operatorsAdded between the last run and the first operatorAdded we get in the current run. I am on the fence whether that is alright.

Ah okay, I understand.
The downside might be fine, because it if it's truly not catching the events, we will find out soon enough?

we would not do the check on the first operatorAdded we see, but use it to initialize the last_operator_add_see

That's the crux of the issue, isn't it? A pruned client could have deleted the state containing some initial operators

anchor/eth/src/event_processor.rs

diegomrsantos · 2025-09-23T16:20:06Z

The current approach might not be enough. Does the operator id start at zero or one and then continue increasing by 1? If so, we need to guarantee that we have seen all of them up to the current value in the contract.

petarjuki7 · 2025-09-23T20:07:19Z

we need to guarantee that we have seen all of them up to the current value in the contract.

I think that was the downside discussed about this approach when we don't persist the max_operator_id_seen variable. We kind of "trust" the first OperatorAdded event we see will initialize the variable with the highest OperatorId.

I think the purpose of this PR is more to warn users that something with syncing is wrong so they configure their EL client correctly.

But I could implement a check that goes through our saved OperatorIds in the database and checks if they are monotonically increasing by 1 up until the max_seen?

diegomrsantos · 2025-09-24T08:52:06Z

What I described is a suggestion about how to check if Anchor has synced correctly starting with an empty DB. Additionally, we also need to persist the highest operator id we have seen to be able to do the check when we start syncing after being offline.

diegomrsantos · 2025-09-24T10:18:45Z

I think we do have to persist the highest_operator_id_seen. It will be created initially with the first value used by the contract.

But I could implement a check that goes through our saved OperatorIds in the database and checks if they are monotonically increasing by 1

This needs to be done during sync, starting with the current value in the DB.

diegomrsantos · 2025-09-24T10:23:30Z

And before we start syncing, we could read the last operator id from the ssv contract and check that see all ids up to this value. But not sure it's necessary

petarjuki7 · 2025-09-24T10:27:31Z

Yeah, I understand the idea, can do it.
Just tagging @dknopik to confirm, if I remember correctly we decided on the non-persistence path yesterday.

dknopik · 2025-09-24T10:30:27Z

@diegomrsantos In yesterdays meeting we discussed that we can also not persist the highest_operator_id_seen and initialize it with the first operator ID seen per sync. This would be sufficient to catch any missing operatorAdded for the lifetime of that instance.

In the meeting, I asked if there is agreement to do it this way for now, as it is a tradeoff for not having to modify the database that closely to the release. While it is of course okay to change one's mind, I believe the simpler option is better for a short term change

dknopik · 2025-09-24T10:30:49Z

@petarjuki7 please rebase onto release-v1.0.0

dknopik · 2025-09-24T13:54:17Z

anchor/eth/src/event_processor.rs

+                but got operator {operator_id}."
+                );
+            }
+            self.db.bump_max_operator_id_seen();


I think it would be a bit cleaner to just have set_max_operator_id_seen

dknopik · 2025-09-24T13:56:49Z

anchor/eth/src/sync.rs

+            .event_processor
+            .db
+            .state()
+            .get_max_operator_id_seen()
+            .is_none()
+        {
+            warn!("No OperatorAdded events found in historical sync, there is likely a sync error");


if we do not persist get_max_operator_id_seen, we need to check the operators map instead here, as very short historic syncs (e.g. after a restart) trigger this otherwise.

petarjuki7 requested review from Zacholme7, diegomrsantos and dknopik September 22, 2025 13:15

petarjuki7 self-assigned this Sep 22, 2025

diegomrsantos requested a review from Copilot September 22, 2025 13:42

Copilot AI reviewed Sep 22, 2025

View reviewed changes

anchor/eth/src/event_processor.rs Outdated Show resolved Hide resolved

anchor/eth/src/event_processor.rs Outdated Show resolved Hide resolved

dknopik reviewed Sep 22, 2025

View reviewed changes

petarjuki7 added 5 commits September 24, 2025 13:56

fix: warn if OperatorAdded events not in sync

02b895e

small refactor

dbf4181

change of error trigger and handling

6e1c213

keep max_operator_id in memory only

8c8baff

fix off by 1 bug

8764b86

petarjuki7 force-pushed the warn_missing_logs branch from f52b3f6 to 8764b86 Compare September 24, 2025 11:57

petarjuki7 changed the base branch from unstable to release-v1.0.0 September 24, 2025 11:58

petarjuki7 requested a review from dknopik September 24, 2025 12:44

dknopik reviewed Sep 24, 2025

View reviewed changes

petarjuki7 added 2 commits September 25, 2025 13:33

check operators map

d849cce

refactor

9f5f856

dknopik changed the base branch from release-v1.0.0 to unstable October 20, 2025 13:20

fix: warn if OperatorAdded events not in sync #625

Are you sure you want to change the base?

fix: warn if OperatorAdded events not in sync #625

Conversation

petarjuki7 commented Sep 22, 2025

Issue Addressed

Proposed Changes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

dknopik left a comment

Choose a reason for hiding this comment

Uh oh!

dknopik Sep 22, 2025

Choose a reason for hiding this comment

Uh oh!

petarjuki7 Sep 22, 2025

Choose a reason for hiding this comment

Uh oh!

dknopik Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

petarjuki7 Sep 22, 2025

Choose a reason for hiding this comment

Uh oh!

diegomrsantos Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

diegomrsantos commented Sep 23, 2025

Uh oh!

petarjuki7 commented Sep 23, 2025

Uh oh!

diegomrsantos commented Sep 24, 2025

Uh oh!

diegomrsantos commented Sep 24, 2025

Uh oh!

diegomrsantos commented Sep 24, 2025

Uh oh!

petarjuki7 commented Sep 24, 2025

Uh oh!

dknopik commented Sep 24, 2025

Uh oh!

dknopik commented Sep 24, 2025

Uh oh!

dknopik Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

dknopik Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dknopik Sep 22, 2025 •

edited

Loading