Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ClientActor ingress saturates at ~5K TPS #12963

Open
ssavenko-near opened this issue Feb 20, 2025 · 0 comments
Open

ClientActor ingress saturates at ~5K TPS #12963

ssavenko-near opened this issue Feb 20, 2025 · 0 comments

Comments

@ssavenko-near
Copy link
Contributor

ssavenko-near commented Feb 20, 2025

Issue

Running the native transaction benchmarks in the single node setup saturates at about ~5K (depending on a particular PC). This happens independently of whether the transactions are injected via rpc (benchmarks/synth-bm) or fed directly to ClientActor (benchmarks/transactions-generator).

Analysis of the perf profile, permalink (data produced by @Trisfald) of the rpc-based load indicates:

  • process_tx (which is responsible for pushing the transactions into the pool) and produce_chunk are run on the same thread and together they take almost 100% of handle() activity which itself is responsible for 74% out of 81% CPU used by the thread image
  • process_tx interleaves execution with produce_chunk with ~1/3 of the time available to the former Image
  • process_tx does choke on CPU during the time spans that are available to it

That means the bottleneck is caused by process_tx being choked on CPU and only has ~1/3 time slice available to it due to sharing thread with the produce_chunk.

Possible solutions:

The possible ways to overcome this bottleneck would be:

  • enable process_tx to run in parallel with chunk_produce. That should roughly 3x the max injection rate
    • this turns out to not be so easy:
      • ShardedPool is not thread safe
      • prepare_transactions() pushes the transaction back to the pool to remove them again once they are included into the block. This basically means process_tx() and produce_chunk() must be run sequentially in the current setup.
  • run the validate_tx (75% of process_tx) on the thread pool. Pool size of 3 can be expected to 2x the max injection rate
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant