Skip to content

Fix/indexing wait instead of crash#826

Open
Timi16 wants to merge 5 commits intozingolabs:devfrom
Timi16:fix/indexing-wait-instead-of-crash
Open

Fix/indexing wait instead of crash#826
Timi16 wants to merge 5 commits intozingolabs:devfrom
Timi16:fix/indexing-wait-instead-of-crash

Conversation

@Timi16
Copy link
Contributor

@Timi16 Timi16 commented Feb 9, 2026

Fixes:
This PR prevents FinalisedState::sync_db_from_reorg() from crashing when the local cache/index height is ahead of the current chain, or when requested blocks are not yet available from the node.

Fixes included

  • Cache height sanity check: If cached DB height is greater than current chain height, clear the cached DB and restart from height 0 (prevents invalid lookups during reorg / fresh chains).
  • Wait/retry instead of crash: Replace “fail hard” behavior when blocks aren’t available yet with a retry loop:
    • If the node does not yet have the block at the requested height, we sleep and retry rather than erroring out.
    • This avoids crashes during early sync / indexing periods.

Why this matters

During initial indexing or certain reorg conditions, the node may not immediately return blocks at the requested height. Previously, this could cause sync_db_from_reorg() to error out (or behave like a crash from the caller perspective). Now it behaves gracefully by waiting for chain data to become available.

Implementation notes

  • Uses get_blockchain_info() to compare cached DB height vs current chain height.
  • Uses a tokio::time::sleep(Duration::from_secs(2)) loop to retry fetching the missing block.
  • Keeps existing behavior for unexpected block response types.

Files changed

  • zaino-state/src/local_cache/finalised_state.rs

How to test

  • Start from a fresh DB / cleared cache and run the normal sync flow.
  • Simulate a reorg / restart while indexing.
  • Confirm the process no longer fails when blocks are temporarily unavailable and eventually continues syncing once blocks exist.

Fixes: indexing wait instead of crash

Copy link
Contributor

@pacu pacu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is the correct fix.

@nachog00 @idky137 could you take a look at this?

Also are you folks deprecating/removing local_cache? If so, this might not be an issue after all


let mut check_hash = match self

// FIXED: Check if cached height is beyond current chain BEFORE trying to fetch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment doesn't make much sense because there's no previous history in the comments about an issue with this code so its not self-explanatory of what happened and what's (supposedly) being fixed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. This addresses the issue where the cached database height could be higher than the current chain height (e.g., when switching from mainnet to testnet, or when the chain resets). Without this check, the code would try to fetch a block at a height that doesn't exist yet, causing a crash.

The fix detects this condition early and clears the cache to start fresh with the new chain. I'll update the comment to be more descriptive:

// Check if cached height exceeds current chain height.
// This can occur when:
// - Switching between networks (mainnet/testnet/regtest)
// - Chain has reset or been pruned
// If detected, clear the cache and resync from genesis

}
}

self.status.store(StatusType::Ready);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you removing this line where the status is updated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! That line shouldn't have been removed. Looking at the code flow, the status needs to be set to Ready after the sync completes successfully.

I'll restore this line:

self.status.store(StatusType::Ready);
Ok(())

This was an unintended change while refactoring the sync logic.

Copy link
Contributor

@pacu pacu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@idky137 could you review this when you have a moment? It's what we discussed on Tuesday about checking chain_index reorg logic.

current_chain_height
);
{
let mut txn = self.database.begin_rw_txn()?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This erases the whole cache. While it might be fine for Regtest, it probably isn't the appropriate behavior in production.

@pacu pacu requested a review from idky137 February 11, 2026 20:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants