-
Notifications
You must be signed in to change notification settings - Fork 111
perf(levm): store valid jump targets with code #4961
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Lines of code reportTotal lines added: Detailed view |
Benchmark Block Execution Results Comparison Against Main
|
crates/common/trie/trie.rs
Outdated
| // Hash value for an empty trie, equal to keccak(RLP_NULL) | ||
| pub static ref EMPTY_TRIE_HASH: H256 = H256::from_slice( | ||
| Keccak256::new() | ||
| &Keccak256::new() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This and other changes like this are to silence a deprecated notice from sha
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto
|
The size diff appears large due to a cargo.lock update in the tooling/bench code adding +3,819 lines, the pr is not big |
Benchmark for bdbc6fcClick to view benchmark
|
Benchmark for 16909a5Click to view benchmark
|
| # Run hyperfine benchmark, removing the database before each run. | ||
| # Pipe `yes` into the remove command to automatically confirm. | ||
| hyperfine -w 5 -N -r 10 --show-output --export-markdown "bench_pr_comparison.md" \ | ||
| -L bin "$BINS" -n "{bin}" \ | ||
| --prepare "yes | ./bin/ethrex-{bin} removedb" \ | ||
| "./bin/ethrex-{bin} --network fixtures/genesis/perf-ci.json --force import ./fixtures/blockchain/l2-1k-erc20.rlp" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The command was using removedb as a setup for each batch of runs, but the command was cancelled due to having no input. I changed it to use --prepare instead, which runs before each single run, and piped yes into it to automatically confirm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was reverted, but we should fix this if we intend to keep this benchmark.
Benchmark for 96dee34Click to view benchmark
|
Benchmark for bda8527Click to view benchmark
|
Benchmark for 947e30aClick to view benchmark
|
| let codes_hashed = guest_program_state | ||
| .codes_hashed | ||
| .iter() | ||
| .map(|(h, c)| (*h, c.bytecode.to_vec())) | ||
| .collect(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many of these conversions seem unneeded. We should check that they don't affect the performance of the L2.
| let code_value = AccountCodeRLP::from(code).bytes().clone(); | ||
| batch.put_cf(&cf_codes, code_key, code_value); | ||
| let mut buf = | ||
| Vec::with_capacity(6 + code.bytecode.len() + 2 * code.jump_targets.len()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, can't we implement RLPEncode for Code instead? Or put this in a helper.
Benchmark for 9d3432dClick to view benchmark
|
| .collect::<Vec<u8>>() | ||
| .as_slice() | ||
| .encode(&mut buf); | ||
| self.write_async(CF_ACCOUNT_CODES, hash_key, buf).await |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to update the documentation of CF_ACCOUNT_CODES
Benchmark for f562e67Click to view benchmark
|
| let Some(bytes) = self.read_sync(CF_ACCOUNT_CODES, hash_key)? else { | ||
| return Ok(None); | ||
| }; | ||
| let bytes = Bytes::from_owner(bytes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for this variable.
| .map(|bytes| AccountCodeRLP::from_bytes(bytes).to()) | ||
| .transpose() | ||
| .map_err(StoreError::from) | ||
| let Some(bytes) = self.read_sync(CF_ACCOUNT_CODES, hash_key)? else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could use the get_pinned_cf rocksdb method here to avoid copies before deserializing.
| let (bytecode, targets) = decode_bytes(&bytes)?; | ||
| let (targets, rest) = decode_bytes(targets)?; | ||
| if !rest.is_empty() || !targets.len().is_multiple_of(2) { | ||
| return Err(StoreError::DecodeError); | ||
| } | ||
| let code = Code { | ||
| hash: code_hash, | ||
| bytecode: Bytes::copy_from_slice(bytecode), | ||
| jump_targets: targets | ||
| .chunks_exact(2) | ||
| .map(|c| u16::from_le_bytes([c[0], c[1]])) | ||
| .collect(), | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be an RLPDecode implementation.
| // Store updated code in DB | ||
| if let Some(code) = &update.code { | ||
| self.add_account_code(info.code_hash, code.clone()).await?; | ||
| self.add_account_code(code.clone()).await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not introduced by this PR, but this clone isn't needed in the underlying Store implementation, since it only needs a reference for serialization.
| .bytecode | ||
| .bytecode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks kind of weird
| let address = word_to_address(self.current_call_frame.stack.pop1()?); | ||
| let address_was_cold = !self.substate.add_accessed_address(address); | ||
| let account_code_length = self.db.get_account_code(address)?.len().into(); | ||
| // FIXME: a bit wasteful to fetch the whole code just to get the length. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only use of this opcode I've seen is to check if an account is an EOA (post-7702). If it's an EOA, then SENDER and CALLER are the same, and EXTCODESIZE is 23, but later we also need to fetch the code anyways, so any cache would negate the cost of this use.
| call_frame | ||
| .bytecode | ||
| .jump_targets | ||
| .binary_search(&jump_address) | ||
| .is_ok() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should go in Code
| pub hash: H256, | ||
| pub bytecode: Bytes, | ||
| // TODO: Consider using Arc<[u16]> (needs to enable serde rc feature) | ||
| pub jump_targets: Vec<u16>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs a comment explaining the reason for using u16 instead of bigger numbers. Linking to https://eips.ethereum.org/EIPS/eip-170 and https://eips.ethereum.org/EIPS/eip-7907. Also, EIP-7907 might break this since it bumps initcode max size to 96Kb.
| } | ||
|
|
||
| fn compute_jump_targets(code: &[u8]) -> Vec<u16> { | ||
| debug_assert!(code.len() <= u16::MAX as usize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Let's refactor and add comments in a following PR
Motivation
Jumps are noticeably slow compared to most of the VM. This is due to lazy computation of invalid jump destinations.
Description
Instead of lazy computation of blocklist, do greedy computation of allowlist and store the result, fetch it with the DB.
Closes #4951
Comparison of gigagas/s in hoodi between one node synced on this branch (orange) and three others on main:
Missing:
unsafecode, it can be converted to a simple copy, there's no evidence that overhead is problematic;from_bytecode;