Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce num default tvu threads from 8 to 1 #5134

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

steviez
Copy link

@steviez steviez commented Mar 4, 2025

Refresh of #998

Problem

We currently create 8 threads that solely try to pull packets out from sockets associated with the turbine port. Multiple threads were added to mitigate buffer receive errors. With improvements in the software including the use of recvmmsg, using 8 threads is overkill and we can read from this port plenty fast with a single thread

Summary of Changes

The value was already configurable with a hidden CLI arg; simply decrease the default from 8 to 1 now:

solana_gossip::cluster_info::DEFAULT_NUM_TVU_SOCKETS.get()

Testing

For a basic sanity check, I ran bench-streamer. With the default settings of 4 producers / 1 receiver, I see that the receiver can pull > 900k packets / second.

With this known, I then setup my node to generate additional load to itself on the TVU port. Since we're only exercising the ability for our node to pull packets out of the socket buffer, I crafted the packets such that the shred sigverify pipeline would throw the packets out prior to doing an actual sigverify. The below graph shows the following:

  • Orange - shred_sigverify.num_packets - I divided by two to get packets / second (2 second metric interval)
  • Red - shred_sigverify.num_discards_pre - divided by two again
  • Blue - net-stats-validator.rcvbuf_errors_delta - I multiplied by 100k
image

So, my node is receiving ~375k packets per second at this port with 0 dropped packets. The max number of unique shreds per second can be derive from the max number of shreds per block:

(32_768 data_shreds_per_block + 32_768 coding_shreds_per_block) * 2.5 blocks_per_second = 163_840 shreds_per_second

My guess is that the node can handle higher too; I'll push it a bit more tomorrow. Lastly, it should be noted that I'm doing the load gen on the same machine, so the load gen is "stealing resource" from validator in some sense.

Performance Gains

TODO

We currently create 8 threads that solely try to pull packets out from
sockets associated with the turbine port. Multiple threads were added to
mitigate buffer receive errors. With improvements in the software
including the use of recvmmsg, using 8 threads is overkill and we can
read from this port plenty fast with a single thread
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant