-
Notifications
You must be signed in to change notification settings - Fork 927
Implement semi-supernode flag #8241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: unstable
Are you sure you want to change the base?
Conversation
…_custody to custody_context
/// This is derived from CLI flags and not persisted. | ||
#[serde(skip)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of these settings are derived from CLI flags? I don't think we need the serde(skip)
here, if anything I think we might need it so we can write CLI tests in lighthouse/tests/beacon_node.rs
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(the persistence of this config using serde is only for tests)
CustodyContext { | ||
node_custody_type: NodeCustodyType, | ||
) -> Result<Self, String> { | ||
if ssz_context.persisted_is_supernode && node_custody_type != NodeCustodyType::Supernode { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we might want a DB schema change to store the NodeCustodyType
? Then we could enforce that the node can't change "down", i.e. supernode to semi-supernode/full, or semi-supernode to full?
Does anything break if we don't block the transition from semi-supernode to full? Because this check already covers both supernode cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I initially added this check thinking we want to prevent changes to the NodeCustodyType
.
But it is technically possible to do, we just need to set the correct value of earliest_available_slot
.
One use case is all the nodes on sepolia that had to run a supernode to reconstruct all the blobs could downgrade to a semi-supernode.
We initially wanted to avoid switching of modes but its quite easy to handle in the rpc layer by just setting the right value of the earliest_available_slot
. So I don't see any strong reason to not to allow this
half of the data columns (enough for reconstruction), enabling efficient \ | ||
data availability with lower bandwidth and storage requirements compared to \ | ||
a supernode, while still supporting full blob reconstruction.") | ||
.display_order(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, just need to update the cli help in lighthouse book
new_custody_group_count: updated_cgc, | ||
sampling_count: self.num_of_custody_groups_to_sample(effective_epoch, spec), | ||
new_custody_group_count: updated_cgc as u64, | ||
sampling_count: self.sampling_count_at_epoch(None, spec) as u64, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed num_of_custody_groups_to_sample
and num_of_data_columns_to_sample
are now both mapped to sampling_count_at_epoch
- I'm sure this is intentional, but I want to make sure the reason for the previous separation was clear - it was mainly to support subnet decoupling (which isn't my favourite feature) and unifying this to a single function is likely going to break that.
See:
validator_custody_at_head: ssz_v24.validator_custody_at_head, | ||
persisted_is_supernode: ssz_v24.persisted_is_supernode, | ||
// TODO(pawan): fix | ||
persisted_node_custody_count: if ssz_v24.persisted_is_supernode { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that we'll need a new schema migration file for this change
let update_earliest_available_slot = if cgc_current > cgc_persisted { | ||
true | ||
} else { | ||
false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this assume we have previously completed column backfill?
I've done a quick pass and I feel like this is quite a risky change before the release as there's a bit of refactor and it touches quite a lot of code related to custody / sampling and may be something we want to review and test throughly. I can continue the review tomorrow. Is it possible to have |
I agree. Its a lot more change than I anticipated initially. I don't think we should rush it for the release either. |
Addresses #8218 A simplified version of #8241 for the initial release. I've tried to minimise the logic change in this PR, although introducing the `NodeCustodyType` enum still result in quite a bit a of diff, but the actual logic change in `CustodyContext` is quite small. The main changes are in the `CustdoyContext` struct * ~~combining `validator_custody_count` and `current_is_supernode` fields into a single `custody_group_count_at_head` field. We persist the cgc of the initial cli values into the `custody_group_count_at_head` field and only allow for increase (same behaviour as before).~~ * I noticed the above approach caused a backward compatibility issue, I've [made a fix](15569bc) and changed the approach slightly (which was actually what I had originally in mind): * when initialising, only override the `validator_custody_count` value if either flag `--supernode` or `--semi-supernode` is used; otherwise leave it as the existing default `0`. Most other logic remains unchanged. All existing validator custody unit tests are still all passing, and I've added additional tests to cover semi-supernode, and restoring `CustodyContext` from disk. Note: I've added a `WARN` if the user attempts to switch to a `--semi-supernode` or `--supernode` - this currently has no effect, but once @eserilev column backfill is merged, we should be able to support this quite easily. Things to test - [x] cgc in metadata / enr - [x] cgc in metrics - [x] subscribed subnets - [x] getBlobs endpoint Co-Authored-By: Jimmy Chen <[email protected]>
Issue Addressed
#8218
Proposed Changes
// TODO