Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: 2% blocks missed after sync aggregate addition #1297

Merged
merged 4 commits into from
Sep 19, 2024

Conversation

rodrigo-o
Copy link
Collaborator

@rodrigo-o rodrigo-o commented Sep 17, 2024

Motivation

Fixing 2% block missed when proposing on 32 * n + 1 slot.

Description

This PR fix the issue related to the Invalid signature error that we got on those particular slots. The problem is related to the parent root, we lacked a filter of contributions that doesn't match our parent root. This was happening at epoch boundaries because that's where we get a delay in the proposal and the second third kicks in before the notify head, which publish messages of the previous root, and make the aggregate become 0.

Resolves #1292

@rodrigo-o rodrigo-o changed the title bug: 2% blocks missed after sync aggregate addition fix: 2% blocks missed after sync aggregate addition Sep 17, 2024
@rodrigo-o
Copy link
Collaborator Author

Now slot 33 is working, but shows 0 because of the delayed 32.

image

@rodrigo-o rodrigo-o marked this pull request as ready for review September 17, 2024 20:12
@rodrigo-o rodrigo-o requested a review from a team as a code owner September 17, 2024 20:12
Comment on lines 74 to 76
# TODO: This function needs to be measured and optimized if possible, the main point to look
# at is the beacon fetch and sync committees computation. Also this could be done asynchronusly
# given that except for the first tame it always calculat 1 epoch ahead of time.
Copy link
Collaborator Author

@rodrigo-o rodrigo-o Sep 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some information regarding this. The time consumption of sync_committees (which reached 4 seconds on one of my kurtosis tests) is related to the way aggregators are calculated. It has a couple of things very different than attestations which explain why this is costly:

  • Attestations are done on one slot per epoch per validator, this is tied to the committee, in sync committees instead, every validator publish in every slot of every epoch during the period of the committee.
  • Aggregation is calculated on a per slot basis (we need to compute a signature using the slot) both in attestations and in sync committees, the caveat with sync committees is that they can be aggregators of any slot, which in attestations can only happen once per epoch and here this needs to be checked for the 32 slots.
  • This is also calculates for every subcommittee, which in kurtosis main spec every subcommitte is completely repeated due to the low amount of validators (in the test we normally run with less than 120 validators).
  • Given all this, for 60 validators (the test I was seeing the 4 seconds delay) we have to calculate proofs for every subcommitte (4) for every slot (32). The proof calculation cost approximately 0.5 ms. so:
    • 60 * 32 * 4 * 0.5 = 3840ms (aprox. 4 seconds)

This comment could be removed because:

  • If this was an issue, there is not so much room for improvement in the function, instead it could be computed asynchronously (given that it calculate the next epoch duties ahead of time). We could even move the calculation some slots further into the new epoch to avoid colliding with the epoch transition.
  • In Mainnet this wont be a problem, because having just one validator in a particular period of a sync committee is the norm, more than one in the same period would be statistically improbable. For example, in a test with more participants (10 nodes of 128 validators each and us with 60), the time for computing this was decreased to something near half a second. In a real scenario this will be near a couple of milliseconds.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the deep dive and the notes. It's pretty clear. I've created the issue #1299 to discuss this for a different PR if needed at all. I suggest adding the issue number to the TODO (all TODOs should refer to an issue IMO).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Tomy! Already added the issue to the comment, and yes I totally agree with you I was either going to remove the comment in the code or create the issue!

Copy link
Collaborator

@Arkenan Arkenan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Rodri! Awesome PR.

Comment on lines 74 to 76
# TODO: This function needs to be measured and optimized if possible, the main point to look
# at is the beacon fetch and sync committees computation. Also this could be done asynchronusly
# given that except for the first tame it always calculat 1 epoch ahead of time.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the deep dive and the notes. It's pretty clear. I've created the issue #1299 to discuss this for a different PR if needed at all. I suggest adding the issue number to the TODO (all TODOs should refer to an issue IMO).

@rodrigo-o rodrigo-o merged commit 59eec20 into main Sep 19, 2024
13 checks passed
@rodrigo-o rodrigo-o deleted the fix-sync-aggregate-block-missed branch September 19, 2024 13:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

SyncAggregate: 2% of invalid Signatures on block proposals where we are most of the validators.
2 participants