Typical Throughput on 10Gbps LAN? #5372

mechaniputer · 2024-10-31T14:05:13Z

mechaniputer
Oct 31, 2024

Hi everyone,

I have written a simple round trip FastDDS app which communicates between two machines which are directly connected to each other (no router or switch) via 10Gbps ethernet.

I based it on the example described in [1]. I am using FastDDS 2.14.0. I know it is old, but I just want to understand the performance of what I currently have before changing things too much.

Things I already tried to improve performance:
I am using the RELIABLE_RELIABILITY_QOS. I am specifically interested in the performance of this setting, and I already know that other settings can improve throughput.

I tried the following with the default 1 second heartbeat period for all writers, and then I tried it with a 500 ns period as described in [2]. In both cases the results were similar. It is possible that there is a sweet spot and that I just need to do more tuning, but these results seem worse than I expected either way.

I also tried increasing the max reader+writer socket sizes, also as described in [2]. It likewise had no measurable effect.

Finally, I tried increasing the txqueuelen to 10000 as described in [2], again with no measurable effect.

Here is what my code is doing:
On the first machine, my "driver" app publishes a single sample of some configured size to topic A, and immediately calls DataReader::wait_for_unread_message() on topic B to wait for a response from the server. Once it receives this response, it takes the sample with DataReader::take_next_sample() and then loops again, sending another sample.

I also use std::chrono::steady_clock_now() before sending a sample and after receiving the response, in order to measure the round trip latency.

On the other machine, the "server" app calls DataReader::wait_for_unread_message() on topic A, takes the sample with DataReader::take_next_sample(), immediately sends the same received data back on topic B and re-loops.

What I am wondering:
What kind of throughput should I reasonably expect for various message sizes? At large sizes it appears remarkably slower than I expected. It runs at around 1 sample per second at a size of ~80 kilobytes. Beyond this size, it appears that packet loss becomes too extreme for meaningful results. What am I doing wrong? I'm wondering if my usage of the API is sub-optimal in some way.

Thanks in advance for any tips.

References:
[1] https://fast-dds.docs.eprosima.com/en/latest/fastddsgen/pubsub_app/pubsub_app.html#fastddsgen-pubsub-app
[2] https://fast-dds.docs.eprosima.com/en/latest/fastdds/use_cases/large_data/large_data.html

MiguelCompany · 2024-10-31T14:37:45Z

MiguelCompany
Oct 31, 2024
Maintainer

@mechaniputer

If the data type you are using is plain (i.e. POD), the first thing I would try is using take instead of take_next_sample (see here for reference).

The take_next_sample method makes a copy of the data. The take method returns a pointer to the internal Fast DDS memory (in case of plain types).

After changing the reading side, you could also do a similar change in the writer side. See documentation here

10 replies

mechaniputer Nov 7, 2024
Author

Is there anything wrong with the above QoS? What else might be the problem?

mechaniputer Nov 12, 2024
Author

Please, any advice would be appreciated. Please tell me if the QoS above is missing something that could be causing this or if it's something else. I am not an expert on using DDS. However my application is vary basic and should not be performing strangely. At times the duration of the pauses is 3 full seconds, on a single 64 byte sample write() invocation. I urgently need to collect realistic performance data and obviously something is wrong.

Edit: Here is the source repo for my code. Note that it is closely based on the HelloWorld example app. https://github.com/mechaniputer/LatencyBench-FastDDS

MiguelCompany Nov 12, 2024
Maintainer

@mechaniputer Thanks for sharing the reproducer. We'll try to reproduce and send you some feedback

mechaniputer Nov 23, 2024
Author

Any news? I am still having this problem. On either the "server" or "driver" side, the call to write a sample sometimes takes a very long time.

mechaniputer Nov 24, 2024
Author

The problem seems to go away if I do not use initial peers and manually-set static IPs on the hosts. I wonder if something strange is going on with my networking when I set the static IPs.

Typical Throughput on 10Gbps LAN? #5372

Uh oh!

Uh oh!

mechaniputer Oct 31, 2024

Replies: 1 comment · 10 replies

Uh oh!

Uh oh!

MiguelCompany Oct 31, 2024 Maintainer

Uh oh!

mechaniputer Nov 7, 2024 Author

Uh oh!

Uh oh!

mechaniputer Nov 12, 2024 Author

Uh oh!

MiguelCompany Nov 12, 2024 Maintainer

Uh oh!

mechaniputer Nov 23, 2024 Author

Uh oh!

mechaniputer Nov 24, 2024 Author

mechaniputer
Oct 31, 2024

Replies: 1 comment 10 replies

MiguelCompany
Oct 31, 2024
Maintainer

mechaniputer Nov 7, 2024
Author

mechaniputer Nov 12, 2024
Author

MiguelCompany Nov 12, 2024
Maintainer

mechaniputer Nov 23, 2024
Author

mechaniputer Nov 24, 2024
Author