Reduce num default tvu threads from 8 to 1 #5134
Open
+1
−1
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Refresh of #998
Problem
We currently create 8 threads that solely try to pull packets out from sockets associated with the turbine port. Multiple threads were added to mitigate buffer receive errors. With improvements in the software including the use of recvmmsg, using 8 threads is overkill and we can read from this port plenty fast with a single thread
Summary of Changes
The value was already configurable with a hidden CLI arg; simply decrease the default from 8 to 1 now:
agave/validator/src/cli/thread_args.rs
Line 308 in 18b49da
Testing
For a basic sanity check, I ran
bench-streamer
. With the default settings of 4 producers / 1 receiver, I see that the receiver can pull > 900k packets / second.With this known, I then setup my node to generate additional load to itself on the TVU port. Since we're only exercising the ability for our node to pull packets out of the socket buffer, I crafted the packets such that the shred sigverify pipeline would throw the packets out prior to doing an actual sigverify. The below graph shows the following:
shred_sigverify.num_packets
- I divided by two to get packets / second (2 second metric interval)shred_sigverify.num_discards_pre
- divided by two againnet-stats-validator.rcvbuf_errors_delta
- I multiplied by 100kSo, my node is receiving ~375k packets per second at this port with 0 dropped packets. The max number of unique shreds per second can be derive from the max number of shreds per block:
My guess is that the node can handle higher too; I'll push it a bit more tomorrow. Lastly, it should be noted that I'm doing the load gen on the same machine, so the load gen is "stealing resource" from validator in some sense.
Performance Gains
TODO