You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Running the native transaction benchmarks in the single node setup saturates at about ~5K (depending on a particular PC). This happens independently of whether the transactions are injected via rpc (benchmarks/synth-bm) or fed directly to ClientActor (benchmarks/transactions-generator).
Analysis of the perf profile, permalink (data produced by @Trisfald) of the rpc-based load indicates:
process_tx (which is responsible for pushing the transactions into the pool) and produce_chunk are run on the same thread and together they take almost 100% of handle() activity which itself is responsible for 74% out of 81% CPU used by the thread image
process_tx interleaves execution with produce_chunk with ~1/3 of the time available to the former Image
process_tx does choke on CPU during the time spans that are available to it
That means the bottleneck is caused by process_tx being choked on CPU and only has ~1/3 time slice available to it due to sharing thread with the produce_chunk.
Possible solutions:
The possible ways to overcome this bottleneck would be:
enable process_tx to run in parallel with chunk_produce. That should roughly 3x the max injection rate
this turns out to not be so easy:
ShardedPool is not thread safe
prepare_transactions() pushes the transaction back to the pool to remove them again once they are included into the block. This basically means process_tx() and produce_chunk() must be run sequentially in the current setup.
run the validate_tx (75% of process_tx) on the thread pool. Pool size of 3 can be expected to 2x the max injection rate
The text was updated successfully, but these errors were encountered:
Issue
Running the native transaction benchmarks in the single node setup saturates at about ~5K (depending on a particular PC). This happens independently of whether the transactions are injected via rpc (
benchmarks/synth-bm
) or fed directly toClientActor
(benchmarks/transactions-generator
).Analysis of the perf profile, permalink (data produced by @Trisfald) of the rpc-based load indicates:
process_tx
(which is responsible for pushing the transactions into the pool) andproduce_chunk
are run on the same thread and together they take almost 100% ofhandle()
activity which itself is responsible for 74% out of 81% CPU used by the thread imageprocess_tx
interleaves execution withproduce_chunk
with ~1/3 of the time available to the former Imageprocess_tx
does choke on CPU during the time spans that are available to itThat means the bottleneck is caused by
process_tx
being choked on CPU and only has ~1/3 time slice available to it due to sharing thread with theproduce_chunk
.Possible solutions:
The possible ways to overcome this bottleneck would be:
process_tx
to run in parallel withchunk_produce
. That should roughly 3x the max injection rateprepare_transactions()
pushes the transaction back to the pool to remove them again once they are included into the block. This basically meansprocess_tx()
andproduce_chunk()
must be run sequentially in the current setup.validate_tx
(75% ofprocess_tx
) on the thread pool. Pool size of 3 can be expected to 2x the max injection rateThe text was updated successfully, but these errors were encountered: