You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: content/blog/2025-04-09-Zenoh-Pico-Performance.md
+6-6Lines changed: 6 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,7 +9,7 @@ draft: false
9
9
10
10
# Improving Zenoh-Pico performances
11
11
12
-
Last year, after the long awaited release of Zenoh 1.0 which included a unified C API with Zenoh-C and Zenoh-Pico, we decided to dedicate some time to measure and improve the performances & efficiency of Zenoh-Pico. These modifications were released with Zenoh 1.1 earlier this year and we're presenting you the results with this blog post.
12
+
Last year, after the long awaited release of Zenoh 1.0 which included a unified C API with Zenoh-C and Zenoh-Pico, we decided to dedicate some time to measure and improve the performance & efficiency of Zenoh-Pico. These modifications were released with Zenoh 1.1 earlier this year and we're presenting you the results with this blog post.
13
13
14
14
## What is Zenoh-Pico?
15
15
@@ -19,7 +19,7 @@ Zenoh-Pico is the lightweight, native C implementation of the[ Eclipse Zenoh](ht
19
19
20
20
To measure performance, we have a standardized throughput test and a latency test which we run on a standardized machine (Intel Xeon E3-1275 @3.6GHzn 32GB DDR4, Ubuntu 22.04). For embedded measurements, we ran those tests on an esp32-wroom32 dev board.
21
21
22
-
These tests produce a thousand measurements or so per payload size that we use to calculate the median value to then get the following graphs (not that the y axis is log scale):
22
+
These tests produce a thousand measurements or so per payload size that we use to calculate the median value to then get the following graphs (note that the y axis is log scale):
23
23
24
24
### PC throughput client, tcp:
25
25
@@ -28,7 +28,7 @@ These tests produce a thousand measurements or so per payload size that we use t
28
28
class="figure-inline"
29
29
alt="Client throughput" >}}
30
30
31
-
We see a massive (>50x) improvement in throughput for payloads over 32kiB, this is because packets of these sizes are fragmented on the network and we had an issue where their data was serialized byte per byte.
31
+
We see a massive (up to 100x) improvement in throughput for payloads over 32kiB, this is because packets of these sizes are fragmented on the network and we had an issue where their data was serialized byte per byte.
32
32
33
33
We also see a >10x improvement in throughput for smaller payloads when using manual batching (more info below) introduced in 1.1 as well.
34
34
@@ -41,7 +41,7 @@ Other than that there are no significant changes because client performance is l
41
41
class="figure-inline"
42
42
alt="Peer throughput" >}}
43
43
44
-
Peer to peer being not limited by router performance, we observe even bigger improvements on smaller payloads with batching (>20x) and on fragmented packets (>100x), which starts at 2kiB on UDP.
44
+
Peer to peer being not limited by router performance, we observe a bigger improvement on smaller payloads with batching (>20x), but a smaller one (>10x) for fragmented packets (>2kiB) because of UDP's smaller packet size.
45
45
46
46
In addition, we observe a 60% throughput increase for the other payload sizes, that results from the many improvements we implemented and that we detail below.
47
47
@@ -103,9 +103,9 @@ We do observe a big impact on latency when trying to send fragmented packets whi
103
103
104
104
## How performance was improved
105
105
106
-
To improve Zenoh-Pico performance, we traced it on PC using (samply)[https://github.com/mstange/samply] and the Firefox debugger to visualize the traces. That allowed us to detect choke points and part of the code that could be improved.
106
+
To improve Zenoh-Pico performance, we traced it on PC using [samply](https://github.com/mstange/samply) and the Firefox debugger to visualize the traces. That allowed us to detect choke points and part of the code that could be improved.
107
107
108
-
As stated earlier, the changes that had the biggest by far is solving the byte by byte copy issue for fragmented packets and the introduction of the manual batching mechanism.
108
+
As stated earlier, the most impactful changes were solving the byte by byte copy issue for fragmented packets and the introduction of the manual batching mechanism.
109
109
110
110
Beside that, we also streamlined a lot how the stack created, used and destroyed data to avoid redundant operations or needless data copies. We also rationalized heap memory usage and fragmentation although these changes were not quantified.
0 commit comments