Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

accounts-db: relax intrabatch account locks #4253

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

2501babe
Copy link
Member

@2501babe 2501babe commented Jan 3, 2025

Problem

simd83 proposes removing the constraint that transactions in the same entry cannot have read/write or write/write contention on any account. a previous pr modified svm to be able to carry over state changes between such transactions while processing a transaction batch. this pr modifies consensus to remove the account locking rules that prevent such batches from being created or replayed

Summary of Changes

add a new function to AccountLocks, try_lock_transaction_batch, which only checks for locking conflicts with other batches, allowing any account overlap within the batch itself. modify Accounts and Bank to use it instead when the feature gate relax_intrabatch_account_locks is activated. also modify prepare_sanitized_batch_with_results to deduplicate transactions within a batch by signature to prevent replay attacks, such that two instances of the same transaction cause the first to lock out the second, in a similar manner to the non-simd83 behavior for this special case

since transaction results are used more extensively than previous, some functions with *_wiith_results variants have bene collapsed into wrappers around that variant. we also refactor several things to favor iterators over vectors, to avoid places where iters are collected and transformed back into iters

important code changes are confined to accounts-db/src/accounts.rs, accounts-db/src/account_locks.rs, and runtime/src/bank.rs. changes in core, ledger, and runtime transaction batch only affect tests. overall the large majority of changes are fixes or improvements to tests

Feature Gate Issue: https://github.com/anza-xyz/feature-gate-tracker/issues/76

@2501babe 2501babe force-pushed the 20250103_simd83locking branch from 3e016b3 to f748a5f Compare January 3, 2025 10:46
@2501babe 2501babe self-assigned this Jan 3, 2025
@2501babe 2501babe force-pushed the 20250103_simd83locking branch 4 times, most recently from 2fb8527 to 11d1dcb Compare January 7, 2025 11:23
@2501babe 2501babe changed the title accounts-db: only lock accounts across threads accounts-db: disable intrabatch account locks Jan 7, 2025
// HANA TODO the vec allocation here is unfortunate but hard to avoid
// we cannot do this in one closure because of borrow rules
// play around with alternate strategies, according to benches this may be up to
// 50% slower for small batches and few locks, but for large batches and many locks
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bench using jemalloc? i'd think it would do a reasonable job of just keeping the mem in thread-local cache for re-use

@2501babe 2501babe force-pushed the 20250103_simd83locking branch from 95b17b3 to d1ec289 Compare January 13, 2025 12:13
@2501babe 2501babe changed the title accounts-db: disable intrabatch account locks accounts-db: relax intrabatch account locks Jan 13, 2025
@2501babe 2501babe force-pushed the 20250103_simd83locking branch 2 times, most recently from a11ef4d to c234cc6 Compare January 17, 2025 14:23
Comment on lines 13 to 16
#[derive(Debug, Default)]
pub struct AccountLocks {
write_locks: AHashSet<Pubkey>,
write_locks: AHashMap<Pubkey, u64>,
readonly_locks: AHashMap<Pubkey, u64>,
}
Copy link
Member Author

@2501babe 2501babe Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because the read- and write-lock hashmaps have the same type now, all the functions that change them are basically the same. we could use an enum or hashmap reference to discriminate and delete half of the functions, but i left it like this for your review before butchering it

Comment on lines +565 to -579
relax_intrabatch_account_locks: bool,
) -> Vec<Result<()>> {
// Validate the account locks, then get iterator if successful validation.
let tx_account_locks_results: Vec<Result<_>> = txs
Copy link
Member Author

@2501babe 2501babe Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accounts::lock_accounts() could be reimpled as a wrapper on lock_accounts_with_results() or possibly deleted, it isnt really required anymore since all batch-building needs to have results. but we could leave it as-is, or could do some kind of refactor with TransactionAccountLocksIterator

Comment on lines -596 to +612
fn lock_accounts_inner(
fn lock_accounts_inner<'a>(
&self,
tx_account_locks_results: Vec<Result<TransactionAccountLocksIterator<impl SVMMessage>>>,
tx_account_locks_results: impl Iterator<
Item = Result<TransactionAccountLocksIterator<'a, impl SVMMessage + 'a>>,
>,
relax_intrabatch_account_locks: bool,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in general where possible i changed vecs to iters, we eliminate several uses of collect() and just pass around the closure chains. this makes the type signatures look kind of stupid tho and there are possibly things we could refactor to be better (like maybe combining transactions with the transaction results instead of taking for granted they always have the same length). im undecided about style tho

Comment on lines +603 to +630
if relax_intrabatch_account_locks {
let validated_batch_keys = tx_account_locks_results.map(|tx_account_locks_result| {
tx_account_locks_result
.map(|tx_account_locks| tx_account_locks.accounts_with_is_writable())
});

account_locks.try_lock_transaction_batch(validated_batch_keys)
} else {
tx_account_locks_results
.map(|tx_account_locks_result| match tx_account_locks_result {
Ok(tx_account_locks) => account_locks
.try_lock_accounts(tx_account_locks.accounts_with_is_writable()),
Err(err) => Err(err),
})
.collect()
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the main branch that the feature controls. we pass it in as a param so Accounts doesnt need FeatureSet, only Bank. the only other place we use the feature is to enable signature-based transaction deduplication

Comment on lines 3183 to 3197
// with simd83 enabled, we must deduplicate transactions by signature
// previously, conflicting account locks would do it as a side effect
let mut batch_signatures = AHashSet::with_capacity(transactions.len());
let transaction_results =
transaction_results
.enumerate()
.map(|(i, tx_result)| match tx_result {
Ok(())
if relax_intrabatch_account_locks
&& !batch_signatures.insert(transactions[i].signature()) =>
{
Err(TransactionError::AccountInUse)
}
Ok(()) => Ok(()),
Err(e) => Err(e),
});
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the dedupe step mentioned in a comment above. we could use a double for loop instead of a hashset but this seemed much more straightforward since the inner loop would have to abort based on the outer loop index

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd guess it's almost certainly faster to just do a brute-force since our batches are small (replay will be size 1 for unified-scheduler); but might get complicated with the current iterator interface. Fine to leave it as is.

@2501babe 2501babe added the feature-gate Pull Request adds or modifies a runtime feature gate label Jan 23, 2025
@2501babe 2501babe force-pushed the 20250103_simd83locking branch 2 times, most recently from d378d3a to c6d7105 Compare January 24, 2025 09:04
@2501babe
Copy link
Member Author

2501babe commented Jan 24, 2025

recent sample of bench comparisons, master vs this branch with simd83 enabled

bench_lock_accounts/batch_size_1_locks_count_2
                        time:   [188.03 µs 188.11 µs 188.19 µs]
                        thrpt:  [5.4414 Melem/s 5.4438 Melem/s 5.4461 Melem/s]
bench_lock_accounts/batch_size_1_locks_count_2_old
                        time:   [194.10 µs 194.20 µs 194.32 µs]
                        thrpt:  [5.2696 Melem/s 5.2729 Melem/s 5.2757 Melem/s]

bench_lock_accounts/batch_size_32_locks_count_64
                        time:   [5.3985 ms 5.6560 ms 5.9096 ms]
                        thrpt:  [173.28 Kelem/s 181.05 Kelem/s 189.68 Kelem/s]
bench_lock_accounts/batch_size_32_locks_count_64_simd83
                        time:   [3.0115 ms 3.0123 ms 3.0132 ms]
                        thrpt:  [339.84 Kelem/s 339.94 Kelem/s 340.03 Kelem/s]

bench_lock_accounts/batch_size_64_locks_count_64_read_conflicts
                        time:   [2.8057 ms 2.8066 ms 2.8075 ms]
                        thrpt:  [364.74 Kelem/s 364.86 Kelem/s 364.97 Kelem/s]
bench_lock_accounts/batch_size_64_locks_count_64_read_conflicts_simd83
                        time:   [2.8175 ms 2.8181 ms 2.8188 ms]
                        thrpt:  [363.28 Kelem/s 363.36 Kelem/s 363.44 Kelem/s]

in general we perform slightly worse for tiny batches and as well or better for large batches. note these benches call code in Accounts and AccountLocks but not Bank

@2501babe 2501babe marked this pull request as ready for review January 24, 2025 12:19
@2501babe 2501babe requested a review from apfitzge January 24, 2025 12:19
Comment on lines 3179 to 3180
// with simd83 enabled, we must deduplicate transactions by signature
// previously, conflicting account locks would do it as a side effect

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we must dedup because we check for already_processed in a batch, then process, then add to the status_cache.

Is that correct summary of why we need to do this now?

Copy link
Member Author

@2501babe 2501babe Jan 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

im not fully confident in my understanding of the status cache but i believe the way it works is you execute the batch and the signatures of processed (non-dropped) transactions go in the status cache only after theyve all been run. there are no checks in svm (nor should there be) for duplicate transactions within a batch

if replay is going to single-batch everything i guess it would enforce this as a side effect but it seemed good to do it here, since this code already did enforce this constraint (a malicious block that put the same transaction in one entry multiple times would fail locking in replay, without anything involving status cache, because the transactions would take the same write lock on the fee-payer)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we should enforce it for sure


// with simd83 enabled, we must deduplicate transactions by signature
// previously, conflicting account locks would do it as a side effect
let mut batch_signatures = AHashSet::with_capacity(transactions.len());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's use message_hash here instead of signature.

status-cache uses both, but signature is only necessary for RPC operation for fast signature lookup.
iirc reason to use message hash is because of signature malleability.

Copy link
Member Author

@2501babe 2501babe Jan 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i used it because message_hash isnt provided by SVMMessage or SVMTransaction. would you like me to add it to SVMTransaction? its available on SanitizedTransaction but providing it from SanitizedMessage would require us to add it to LegacyMessage and v0::LoadedMessage

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can probably just change the trait bound to TransactionWithMeta on this function, that trait should provide it, and I'm fairly certain the things we actually call this with impl it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

had to squash the past because rebasing was getting ugly but this is bae70c0

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have the function directly as part of TransactionWithMeta, just call message_hash() on the TransactionWithMeta tx.

as_sanitized_transaction (unless it IS a sanitized transaction) will create a sanitized transaction. and possibly do 100s of allocations.
That fn is only there because of legacy interfaces - we shouldn't use it anywhere it's not strictly necessary (should be only geyser rn)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gotcha, i see it now. i only looked at TransactionWithMeta rather than StaticMeta

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a comment on that fn. I'll update it to be more clear that basically no one should be using that, except for the couple places we call into geyser.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made comments on that trait better here - #4827

@joncinque
Copy link

This PR contains changes to the solana sdk, which will be moved to a new repo within the next week, when v2.2 is branched from master.

Please merge or close this PR as soon as possible, or re-create the sdk changes when the new repository is ready at https://github.com/anza-xyz/solana-sdk

@2501babe 2501babe force-pushed the 20250103_simd83locking branch from d2cf375 to bae70c0 Compare February 5, 2025 15:03
@2501babe
Copy link
Member Author

2501babe commented Feb 5, 2025

@joncinque this is a core runtime pr, the only sdk change is adding the feature gate. i assume when the new repo is created the procedure is going to be like:

  • pr the new feature gate to sdk which will be accepted and merged without needing to see any outside code
  • keep this pr, rebase on new sdk dependency and remove changes to sdk/feature-set/ which will no longer exist
  • continue with code review as normal

right?

@joncinque
Copy link

Yep that sounds right. I'm also adding a new FeatureSet constructor so that agave can define its own features in this repo and avoid that annoying back and forth. 90% of PRs with changes to the sdk were just feature additions.

@2501babe
Copy link
Member Author

holding off on pushing my rebase because it breaks tests until theres a new sdk version to depend on, but the only change is the featureset file is deleted

@apfitzge if everything looks good to you i think it would be good to get a second reviewer since this is critical code. @jstarry would you be interested?

@jstarry
Copy link

jstarry commented Feb 10, 2025

Yeah I'll take a look now

Copy link

@jstarry jstarry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like there is at least one more thing we need to clean up before SIMD-0083 is activated still. PreBalanceInfo is collected for the full batch and assumes that transactions don't have conflicting write locks. So tx-1 and tx-2 might both modify a token balance but they both use the same "pre-balance" state even though tx-2 should be using the post-balance state of tx-1 for its pre-balance.

I'm going to spend a bit more time reading through more code to make sure we address all non-conflicting tx assumptions before merging this feature gate

Comment on lines +55 to +57
available_batch_keys
.into_iter()
.map(|available_keys| available_keys.map(|keys| self.lock_accounts(keys)))
.collect()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that available_keys is a cheap iterator. We clone validated_keys above before iterating so looks like we will be doing the duplicate-tx-check and the duplicate-tx-account-check twice.

Playground link: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=93f1ae58f8c376195e41af4ad91872ff

@jstarry
Copy link

jstarry commented Feb 10, 2025

It also looks like replay stage rebatching could cause batches with internal conflicting transactions to be rebatched into separate batches which are then processed in parallel

@2501babe
Copy link
Member Author

appreciate the guidance 🙏 my runtime knowledge is still very localized to svm so i don't know what unknown unknowns are uh unknown

It also looks like replay stage rebatching could cause batches with internal conflicting transactions to be rebatched into separate batches which are then processed in parallel

is this the unified scheduler or the one we have been saying we would remove? i was under the impression replay went in ledger order unless a priograph said there were dependencies... does it actually use the assumption that entries are fully parallelizable?

@jstarry
Copy link

jstarry commented Feb 10, 2025

is this the unified scheduler or the one we have been saying we would remove? i was under the impression replay went in ledger order unless a priograph said there were dependencies... does it actually use the assumption that entries are fully parallelizable?

Going to give more context than you asked for to make sure we're all the same page. Replay stage always goes through process_entries regardless of the scheduler used, it internally uses queue_batches_with_lock_retry to collect a sequential number of batches that don't conflict with each other. Then replay stage calls process_batches with the list of batches which it knows it can process in parallel because they don't conflict with each other. Inside process_batches is where the scheduler choice comes into play..

For the unified scheduler, all transactions are scheduled along with a sequential index so even if a batch has inner conflicting transactions, I believe the scheduler is smart enough to keep the original tx order of dependent transactions and also respect lock conflicts in a way that no races happen. So I think it's totally unaffected by this PR but I've never looked closely at the unified scheduler code.

The old scheduler is the one that does the rebatching and if it gets removed, then yeah we can just not worry about fixing it for SIMD-0083. If we plan on removing it, I would prefer it gets removed before this PR is merged though. If we want to keep support for it, the issue that we would need to fix is how the rebatching works. Inside rebatch_and_execute_batches, we flatten all the batch transactions because we assume that from before that each batch in the passed list of batches doesn't have lock conflicts with other batches in the list and none of the batches have inner conflicts. It then chunks up the flattened list of transactions into new batches inside rebatch_transactions and processes those new batches in parallel inside execute_batches_internal.

@apfitzge
Copy link

is this the unified scheduler or the one we have been saying we would remove? i was under the impression replay went in ledger order unless a priograph said there were dependencies... does it actually use the assumption that entries are fully parallelizable?

Going to give more context than you asked for to make sure we're all the same page. Replay stage always goes through process_entries regardless of the scheduler used, it internally uses queue_batches_with_lock_retry to collect a sequential number of batches that don't conflict with each other. Then replay stage calls process_batches with the list of batches which it knows it can process in parallel because they don't conflict with each other. Inside process_batches is where the scheduler choice comes into play..

For the unified scheduler, all transactions are scheduled along with a sequential index so even if a batch has inner conflicting transactions, I believe the scheduler is smart enough to keep the original tx order of dependent transactions and also respect lock conflicts in a way that no races happen. So I think it's totally unaffected by this PR but I've never looked closely at the unified scheduler code.

The old scheduler is the one that does the rebatching and if it gets removed, then yeah we can just not worry about fixing it for SIMD-0083. If we plan on removing it, I would prefer it gets removed before this PR is merged though. If we want to keep support for it, the issue that we would need to fix is how the rebatching works. Inside rebatch_and_execute_batches, we flatten all the batch transactions because we assume that from before that each batch in the passed list of batches doesn't have lock conflicts with other batches in the list and none of the batches have inner conflicts. It then chunks up the flattened list of transactions into new batches inside rebatch_transactions and processes those new batches in parallel inside execute_batches_internal.

WRT unified-scheduler: it handles the conflicts internally by tracking dependencies.
WRT blockstore processor:

@2501babe
Copy link
Member Author

2501babe commented Feb 13, 2025

the feature gate for this pr is now in the new sdk repo, we can rebase once we have a new release. going to look at PreBalanceInfo now

@joncinque do we have a process for how to handle next steps? ie, can we request new sdk releases from master at any time, and if so who handles this? there isnt a way to move forward on this pr without the feature gate in a crates.io release

edit: i asked in releng after discovering sdk repo crates have independent versioning, so in theory we can release solana-feature-set on its own without having to worry about rolling full sdk versions

@apfitzge im on board with either approach. 2.2 has been cut so we are now running on backport time

@joncinque
Copy link

@joncinque do we have a process for how to handle next steps? ie, can we request new sdk releases from master at any time, and if so who handles this? there isnt a way to move forward on this pr without the feature gate in a crates.io release

The idea is that you can just publish the crate that you need using the "Publish crate" job: https://github.com/anza-xyz/solana-sdk/actions/workflows/publish-rust.yml

In your case, solana-feature-set v2.2.2 has the one you want, so you can just bump the dependency to v2.2.2 in Agave

@2501babe 2501babe force-pushed the 20250103_simd83locking branch 2 times, most recently from 1ba6d3f to c16b899 Compare February 14, 2025 08:03
@2501babe 2501babe force-pushed the 20250103_simd83locking branch from c16b899 to 4e0c879 Compare February 24, 2025 22:39
@2501babe
Copy link
Member Author

2501babe commented Feb 26, 2025

after reviewing the relevant code for pre- and post-balances in detail i think theres two plausible ways forward. my initial hope was we could change TransactionBalancesSet and TransactionTokenBalancesSet to carry HashMap<Pubkey, u64> for pre- and post-balances for the whole batch, if they were only used to incrementally update the balance-fetching rpc calls. but it looks like (@jstarry please correct me if im wrong, because this would be the easiest fix and also an optimization) this information goes out to geyser which would mean we cant do this

that leaves us with:

  1. construct the pre/post-tx balance info in the svm transaction processing loop and stick in the transaction result. im strongly opposed to this option because it would introduce a token22 dependency into svm
  2. get the starting balances from accounts-db in banking stage and store them in a hashmap, then before or after committing step through each transaction result, create the pre-balances from the hashmap, any relevant post-balances from the transaction result, and update the hashmap with new balances, which will be the pre-balances going forward

unless anyone has objections i will do 2, in a new pr since it will take some refactoring and can stand alone (since the new behavior will be identical to the old behavior in cases with no conflicts)

@apfitzge
Copy link

construct the pre/post-tx balance info in the svm transaction processing loop and stick in the transaction result. im strongly opposed to this option because it would introduce a token22 dependency into svm

can we not introduce an arbitrary inspector closure? something along the lines of this.

allow arbitrary inspection before/after execution (of tx + account data references). for our normal use-case this would just be pushing into the balance sets. and because it's arbitrary fn there's no dependency on token22 added to svm.

I think we effectively already do this with the interface we provide to SVM, TransactionProcessingCallback. Could potentially just add a fn there that we call inside the processing loop before/after execution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-gate Pull Request adds or modifies a runtime feature gate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants