Skip to content

Conversation

dapplion
Copy link
Collaborator

@dapplion dapplion commented Oct 12, 2025

Issue Addressed

We want to allow Lighthouse nodes to start from a non-finalized checkpoint as a way to allow nodes to get un-stuck during long finality. See issue below for more details:

Closes #7089

Proposed Changes

Consider a network where epoch 1000 is finalized, epoch 1001 is justified but we start the node from a state at epoch 2000. The node needs to advertise to its peers and construct attestations setting its finalized checkpoint to epoch 1000 and justified checkpoint to epoch 1001. However, the states of those checkpoints are not available.

I have reviewed all places where that can be a problem, let's go one by one.

Attestation production

  • From EarlyAttesterCache on add_head_block we use the checkpoints in the post state as source of truth
  • Else reads the justified checkpoint from the head state or an advanced state

State cache

  • The finalized state will become the anchor state. We never read the finalized state by "name", instead we check if the root of the stored finalized state matches the one being requested
  • Pruning is not affected

ReqResp Status

  • Produced from the head state, not affected

Fork-choice internals

  • is_finalized_checkpoint_or_descendant needs updating as it assumes the finalized node exist
  • Some code paths need updating as they assume the finalized and justified ProtoNodes exist
  • Pruning is fine as it only acts on the new finalized checkpoint

EL fork_choice_updated

  • Needs updating as we don't known the execution hash of the finalized and justified blocks.

DB migration

  • Not affected, never reads the previous finalized checkpoint, migration is independent of it.

fork_revert

  • Updated to use either the anchor block or finalized block

Testing

In progress

  • Started a kurtosis devnet of 6 lighthouse nodes
  • Let it run for 5000 slots, then stop 50% of the validators, let it run for 10000 slots (non-finality)
  • Run a new node without validators and start with checkpoint sync a recent checkpoint. The node synced to the head and started following the head
  • Start the stopped validators and let the network finalize again

@dapplion dapplion requested a review from jxs as a code owner October 12, 2025 20:49
@dapplion
Copy link
Collaborator Author

TODO: Starting from a skipped slot triggered an error:

Oct 12 22:26:52.298 INFO  Lighthouse started                            version: "Lighthouse/v8.0.0-rc.0-3365937"
Oct 12 22:26:52.298 INFO  Configured network                            network_name: "custom (/network-configs)"
Oct 12 22:26:52.300 INFO  Data directory initialised                    datadir: /data/lighthouse/beacon-data
Oct 12 22:26:52.300 WARN  Discv5 packet filter is disabled
Oct 12 22:26:52.301 INFO  Deposit contract                              deploy_block: 0, address: 0x00000000219ab540356cbb839cbe05303d7705fa
Oct 12 22:26:52.304 DEBUG Opening HotColdDB
Oct 12 22:26:52.304 DEBUG Opening LevelDB                               hot_path: "/data/lighthouse/beacon-data/beacon/chain_db"
Oct 12 22:26:52.631 DEBUG Loaded anchor info                            anchor_info: RwLock { data: AnchorInfo { anchor_slot: Slot(18446744073709551615), oldest_block_slot: Slot(18446744073709551615), oldest_block_parent: 0x0000000000000000000000000000000000000000000000000000000000000000, state_upper_limit: Slot(18446744073709551615), state_lower_limit: Slot(0) } }
Oct 12 22:26:53.001 INFO  Blob DB initialized                           path: "/data/lighthouse/beacon-data/beacon/blobs_db", oldest_blob_slot: Some(Slot(0)), oldest_data_column_slot: None
Oct 12 22:26:53.003 DEBUG Store anchor info                             anchor: AnchorInfo { anchor_slot: Slot(18446744073709551615), oldest_block_slot: Slot(18446744073709551615), oldest_block_parent: 0x0000000000000000000000000000000000000000000000000000000000000000, state_upper_limit: Slot(18446744073709551615), state_lower_limit: Slot(0) }
Oct 12 22:26:53.006 DEBUG Loaded execution endpoint                     endpoint: http://172.16.0.89:8551/, jwt_path: "/jwt/jwtsecret"
Oct 12 22:26:56.787 INFO  Starting checkpoint sync
Oct 12 22:26:56.805 DEBUG Advancing checkpoint state to boundary        state_slot: 25921, block_slot: 25921
Oct 12 22:26:56.822 DEBUG Storing split from weak subjectivity state    slot: 25952, state_root: 0x6dd97e8ed9e7d1fc39421a8e6a85a526eb8dd3512882d132402ccbac5ed2f595, block_root: 0xb3694c1a351c2debfecc81d524c2ae944abd56e1ed04f49c18b0eba1a6c42733
Oct 12 22:26:56.826 DEBUG Storing cold state                            strategy: "snapshot", slot: 0
Oct 12 22:26:56.841 DEBUG Stored frozen block roots at skipped slots    from: 25921, to_excl: 25952, block_root: 0xb3694c1a351c2debfecc81d524c2ae944abd56e1ed04f49c18b0eba1a6c42733
Oct 12 22:26:56.848 DEBUG Storing hot state summary and diffs           state_root: 0x6dd97e8ed9e7d1fc39421a8e6a85a526eb8dd3512882d132402ccbac5ed2f595, slot: 25952, storage_strategy: Snapshot, diff_base_state: snapshot, previous_state_root: 0x32aaeacf2973b5f396763363e36cbf124ef3e4c5c9e0b0c208c8fadfa1f1b2cc
Oct 12 22:26:56.849 INFO  Block production enabled
Oct 12 22:26:56.860 CRIT  Failed to start beacon node                   reason: "Failed to build beacon chain: DB error when reading head block: HotColdDBError(MissingFullBlockExecutionPayloadPruned(0xb3694c1a351c2debfecc81d524c2ae944abd56e1ed04f49c18b0eba1a6c42733, Slot(25921))). Database corruption may be present. If the issue persists, use --purge-db to permanently delete the existing data directory."
Oct 12 22:26:56.860 INFO  Internal shutdown received
Oct 12 22:26:56.860 INFO  Shutting down..                               reason: Failure("Failed to start beacon node")

@michaelsproul michaelsproul added work-in-progress PR is a work-in-progress hardening labels Oct 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hardening work-in-progress PR is a work-in-progress

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants