wip: attempt to use upstream snap sync #771

darioush · 2025-01-29T00:24:03Z

eth/protocols/snap: This is mostly the eth code, with some minor changes
- some imports were modified to have an eth prefix eg, "ethstate, ethrawdb". This is to silence the linter, but I will fix this to reduce the diff with upstream, so you can visualize the diff more easily
- sync.go: this file is the "client" portion of the sync and is mostly untouched, except for omtting this check:
- rawdb.HasTrieNode(s.db, res.hashes[i], nil, account.Root, s.scheme)
- This is because the sync process also populates the snapshot, so if 2 accounts share the same storage root (for example, are two small tries with a couple key values that happen to match) the snapshot for one account will be missing if this check is in place. This only applies to the hash scheme, but I did it so we can keep it running as is for now.
- handler.go: I modified ServiceGetAccountRangeQuery and ServiceGetStorageRangesQuery to "patch in" the existing mechanism for responding to state sync requests, just to get the key/values. We need this because in upstream, there are S snapshot layers kept in memory (I think S=128), and then the disk layer matches head - S + 1 or something of that sort. In coreth/subnet-evm, the disk snapshot matches the last accepted block. We may want to match upstream behavior here, so this is another "chapter" of this project.
- So, for now I replaced how the server gets the key/values in upstream:
- it, err := chain.Snapshots().AccountIterator(req.Root, req.Origin, false) with a call to how we do it now:
- ```
handler := handlers.NewResponseBuilder(
	leafsRequest, leafsResponse, tr, chain.Snapshots(), stateKeyLength, maxLeavesLimit, handlerStats,
)```
```
- To enable this, I added NewResponseBuilder to sync/handlers/leafs_request.go, as the responseBuilder is unexported without this change.
- This way we get the benefit of the optimistic snapshot read and only fall back to iterating the trie when the snapshot data is actually overwritten, plus it is the existing behavior.
- I'm still using the upstream code to generate the proofs, even though we should be able to use the one from the responseBuilder if we wanted, feel free to experiment.
- I reduced the soft limit of the response size from 2 MB to 1 MB but I'm not sure of the actual p2p message limit, better confirm this with those more familiar with the networking stack.
For proof of concept we can keep the code copied in coreth, but for productionizing it ideally the eth/protocols/snap package is used from libevm.
Glue code: To connect the avalanchego p2p layer to the eth sync protocol, I used some basic shim layers for the client in plugin/evm/statesync/connector.go and for the handler in plugin/evm/statesync/handler.go. These contain panics on unexpeted code paths, but we should handle all failures correctly especially those that can be triggered by incoming peer messages.
Modifications to existing state sync handler:
- I set the More flag on the response as a hint to the code in handler.go as to whether the response needs a proof. This helps emulate the abort variable behavior in case there is a partial storage response.
- It seems the responses should be at least 1 key long, otherwise they will not pass the range proof verification. This seems like a (minor) bug in existing code.
Putting it all together:
- added a config option state-sync-use-upstream to plugin/evm/config/config.go
- register the new handler type with vm.Network in vm.go
- Pass some additional options to the state sync client in vm.go
- in plugin/evm/syncervm_client.go:
  - if the useUpstream option is set, replaces the step that syncs eth trie client.syncStateTrie with the new syncer.Sync
  - if stateSyncNodes is set, registers the explicitly specified peers to sync from to the syncer,
  - if it is not set, it configures the syncer as a listener to register and unregister peers as they connect/disconnect to the node.

Note when stateSyncNodes is explicitly set (in the config.json), they should also be passed to the avalanchego using the --state-sync-ids and --state-sync-ips options, to also use those nodes to determine the "sync summary" i.e., where to start syncing from.

darioush added 30 commits January 28, 2025 16:23

wip: attempt to use upstream snap sync

2c7d826

pass sender

e4ab81d

reduce interval for testing

b2b0b98

debugging

7855976

accept EOF

0e78643

debug

924f947

connect it better

ba25762

debug

6e2850d

fix readBytes

4d9a42d

use trie data for now

f1777e2

hacking in some timeout

1f70952

try responding shorter

23a5b71

debug

10ea37f

try

36cdf69

debug

9662ae3

add retries

aa9fa7c

patch in existing handler (server) logic

3f10467

try

2d9e8b6

try

2d3f881

try

f82dd71

debug

23a6e02

try

ab0829c

remove debug

4db37a8

try

3f8f783

more changes

8f71856

make connector thread safe

16a1b51

patch

4079a61

remove unneeded change

7373b2b

cleanup: reduce changes

9c2c254

chore: fix lint

d71d738

darioush added 2 commits February 4, 2025 17:35

fix: shouldn't be parallel, has race

cfaea94

hack: give it more time to run with race detection UTs

1409c82

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wip: attempt to use upstream snap sync #771

wip: attempt to use upstream snap sync #771

darioush commented Jan 29, 2025 •

edited

Loading

wip: attempt to use upstream snap sync #771

Are you sure you want to change the base?

wip: attempt to use upstream snap sync #771

Conversation

darioush commented Jan 29, 2025 • edited Loading

darioush commented Jan 29, 2025 •

edited

Loading