Node Issue: Testnet node getting stuck at block sync after restart #12976

evgenykuzyakov · 2025-02-22T20:26:02Z

Contact Details

[email protected]

Node type

RPC

Which network are you running?

testnet

What happened?

There were 4 of testnet nodes running 2.5.0-rc.1 affected. Note it may be different issues on them.

The nodes fail to do block sync getting stuck at some blocks.

Main issue is the snapshot producing node failed to sync up after the last snapshot was uploaded.
The node fails to sync up with the following error message:

Feb 22 13:30:44 node-ft01 nearcore[263250]: 2025-02-22T13:30:44.513565Z  WARN chain: Error in applying chunk for block shard_id=9 hash=J4yP1KAL57RvCnCbsyjn1jtmj636cxfEePzooxkUmVTs err=Storage Error: MissingTrieValue(TrieStorage, XMuesBVj3SqcHXSrWax4Ca6TjKivHpMgUwXdjcwPGJU)
Feb 22 13:30:44 node-ft01 nearcore[263250]: 2025-02-22T13:30:44.513630Z ERROR client: try_process_unfinished_blocks got errors errors={J4yP1KAL57RvCnCbsyjn1jtmj636cxfEePzooxkUmVTs: StorageError(MissingTrieValue(TrieStorage, XMuesBVj3SqcHXSrWax4Ca6TjKivHpMgUwXdjcwPGJU))}

Reproduce steps

It's easy to reproduce the state using the snapshot from block 188313698 (details https://docs.fastnear.com/docs/snapshots#rpc-testnet-snapshot):

# Latest rclone
sudo -v ; curl https://rclone.org/install.sh | sudo bash
# Will download the snapshot into the `~/.near/data`
curl --proto '=https' --tlsv1.2 -sSf https://raw.githubusercontent.com/fastnear/static/refs/heads/main/down_rclone.sh | DATA_PATH=~/.near/data CHAIN_ID=testnet BLOCK=188313698 bash

Version

neard (release 2.5.0-rc.3) (build 2.5.0-rc.3) (rustc 1.84.0) (protocol 76) (db 43)
features: [default, json_rpc, rosetta_rpc]

Relevant log output

Feb 22 13:30:44 node-ft01 nearcore[263250]: 2025-02-22T13:30:44.513565Z  WARN chain: Error in applying chunk for block shard_id=9 hash=J4yP1KAL57RvCnCbsyjn1jtmj636cxfEePzooxkUmVTs err=Storage Error: MissingTrieValue(TrieStorage, XMuesBVj3SqcHXSrWax4Ca6TjKivHpMgUwXdjcwPGJU)
Feb 22 13:30:44 node-ft01 nearcore[263250]: 2025-02-22T13:30:44.513630Z ERROR client: try_process_unfinished_blocks got errors errors={J4yP1KAL57RvCnCbsyjn1jtmj636cxfEePzooxkUmVTs: StorageError(MissingTrieValue(TrieStorage, XMuesBVj3SqcHXSrWax4Ca6TjKivHpMgUwXdjcwPGJU))}
Feb 22 13:30:45 node-ft01 nearcore[263250]: 2025-02-22T13:30:45.011613Z  INFO stats: #188313698 Downloading blocks 0.00% (44 left; at 188313698) 42 peers ⬇ 504 kB/s ⬆ 666 kB/s 0.00 bps 0 gas/s CPU: 197%, Mem: 5.62 GB
Feb 22 13:30:45 node-ft01 nearcore[263250]: 2025-02-22T13:30:45.684549Z ERROR network: Failed to store connection attempt. peer_info=PeerInfo { id: ed25519:DeRyxMeaSfDC6MeNFMDmHS4tshnYN8VMyCqoybqbUV4g, addr: Some(34.29.37.230:24567), account_id: None }
Feb 22 13:30:46 node-ft01 nearcore[263250]: 2025-02-22T13:30:46.495872Z  WARN chain: Error in applying chunk for block shard_id=9 hash=J4yP1KAL57RvCnCbsyjn1jtmj636cxfEePzooxkUmVTs err=Storage Error: MissingTrieValue(TrieStorage, XMuesBVj3SqcHXSrWax4Ca6TjKivHpMgUwXdjcwPGJU)
Feb 22 13:30:46 node-ft01 nearcore[263250]: 2025-02-22T13:30:46.495954Z ERROR client: try_process_unfinished_blocks got errors errors={J4yP1KAL57RvCnCbsyjn1jtmj636cxfEePzooxkUmVTs: StorageError(MissingTrieValue(TrieStorage, XMuesBVj3SqcHXSrWax4Ca6TjKivHpMgUwXdjcwPGJU))}

Node head info

Node upgrade history

Was running 2.5.0-rc.1 before the protocol upgrade.

DB reset history

The node was producing snapshots for FastNear testnet.

The text was updated successfully, but these errors were encountered:

VanBarbascu · 2025-02-23T01:32:09Z

Thanks for reporting this! The team is aware of the issue and we will come back tomorrow with the mitigation steps.

VanBarbascu · 2025-02-23T18:12:48Z

The team narrowed down the issue to garbage collection on the newly created shard (9). This problem only occurs on rpc nodes.

We are working on the fix so that it does not happen in future reshardings.

If you are affected by this, you can mitigate the issue by getting the good state by either Epoch Sync or restore the node from one of the latest FastNEAR snapshots.

Thanks @evgenykuzyakov for being proactive in facilitating the recovery!

Epoch Sync:

./neard init --download-config rpc --chain-id testnet  --download-genesis

curl -X POST https://rpc.testnet.near.org   -H "Content-Type: application/json"   -d '{
            "jsonrpc": "2.0",
            "method": "network_info",
            "params": [],
            "id": "dontcare"
          }'| jq -r '.result.active_peers[]  as $active_peer  | "\($active_peer.id)@\($active_peer.addr)"' |paste -sd',' -
# put the output in configs.

./neard run

Snapshot instructions can be found here.

evgenykuzyakov · 2025-02-23T18:56:17Z

Archival node got affected as well. Not sure how to recover this yet. I assume I should be able to download older hot snapshots from regular RPC and then sync properly once the GC issue is fixed.

EDIT: Solved that using latest RPC snapshot. Converted it to Hot. Advanced it a bit to have HEAD later than previous cold-data HEAD. Than it was able to sync up. Will upload a new archive snapshot now.

evgenykuzyakov added community Issues created by community investigation required Node Node team labels Feb 22, 2025

evgenykuzyakov assigned VanBarbascu Feb 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Node Issue: Testnet node getting stuck at block sync after restart #12976

Node Issue: Testnet node getting stuck at block sync after restart #12976

evgenykuzyakov commented Feb 22, 2025 •

edited

Loading

VanBarbascu commented Feb 23, 2025

VanBarbascu commented Feb 23, 2025

evgenykuzyakov commented Feb 23, 2025 •

edited

Loading

Node Issue: Testnet node getting stuck at block sync after restart #12976

Node Issue: Testnet node getting stuck at block sync after restart #12976

Comments

evgenykuzyakov commented Feb 22, 2025 • edited Loading

Contact Details

Node type

Which network are you running?

What happened?

Reproduce steps

Version

Relevant log output

Node head info

Node upgrade history

DB reset history

VanBarbascu commented Feb 23, 2025

VanBarbascu commented Feb 23, 2025

evgenykuzyakov commented Feb 23, 2025 • edited Loading

evgenykuzyakov commented Feb 22, 2025 •

edited

Loading

evgenykuzyakov commented Feb 23, 2025 •

edited

Loading