Skip to content

Commit 833a64e

Browse files
authored
Merge pull request #102 from jean-roland/ft_zpperf_post
Fix more typos on zenoh-pico performance blog
2 parents 5493b1b + 0a80088 commit 833a64e

File tree

1 file changed

+27
-27
lines changed

1 file changed

+27
-27
lines changed

content/blog/2025-04-09-Zenoh-Pico-Performance.md

Lines changed: 27 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -7,54 +7,54 @@ description: "April 30th, 2025 -- Paris."
77
draft: false
88
---
99

10-
# Improving Zenoh-Pico performances
10+
# Improving Zenoh-Pico performance
1111

12-
Last year, after the long awaited release of Zenoh 1.0 which included a unified C API with Zenoh-C and Zenoh-Pico, we decided to dedicate some time to measure and improve the performance & efficiency of Zenoh-Pico. These modifications were released with Zenoh 1.1 earlier this year and we're presenting you the results with this blog post.
12+
Last year, after the long-awaited release of Zenoh 1.0 which included a unified C API with Zenoh-C and Zenoh-Pico, we decided to dedicate some time to measure and improve the performance and efficiency of Zenoh-Pico. These modifications were released with Zenoh 1.1 earlier this year and we present the results to you with this blog post.
1313

1414
## What is Zenoh-Pico?
1515

1616
Zenoh-Pico is the lightweight, native C implementation of the[ Eclipse Zenoh](http://zenoh.io) protocol, designed specifically for constrained devices. It provides a streamlined, low-resource API while maintaining compatibility with the main[ Rust Zenoh implementation](https://github.com/eclipse-zenoh/zenoh). Zenoh-Pico already supports a broad range of platforms and protocols, making it a versatile choice for embedded systems development.
1717

1818
## The results
1919

20-
To measure performance, we have a standardized throughput test and a latency test which we run on a standardized machine (Intel Xeon E3-1275 @3.6GHzn 32GB DDR4, Ubuntu 22.04). For embedded measurements, we ran those tests on an esp32-wroom32 dev board.
20+
To measure performance, we have a standardized throughput test and a latency test which we run on a standardized machine (Intel Xeon E3-1275 @3.6GHz, 32GB DDR4, Ubuntu 22.04). For embedded measurements, we ran those tests on an ESP32-WROOM-32 dev board.
2121

22-
These tests produce a thousand measurements or so per payload size that we use to calculate the median value to then get the following graphs (note that the y axis is log scale):
22+
These tests produce a thousand measurements or so per payload size that we use to calculate the median value to then get the following graphs (note that the y-axis is log scale):
2323

24-
### PC throughput client, tcp:
24+
### PC throughput client, TCP:
2525

2626
{{< figure-inline
2727
src="../../img/20250430-Zenoh-Pico-Performance/zpperf1.png"
2828
class="figure-inline"
2929
alt="Client throughput" >}}
3030

31-
We see a massive (up to 100x) improvement in throughput for payloads over 32kiB, this is because packets of these sizes are fragmented on the network and we had an issue where their data was serialized byte per byte.
31+
We see a massive (up to 100x) improvement in throughput for payloads over 32KiB, this is because packets of these sizes are fragmented on the network and we had an issue where their data was serialized byte-by-byte.
3232

3333
We also see a >10x improvement in throughput for smaller payloads when using manual batching (more info below) introduced in 1.1 as well.
3434

3535
Other than that there are no significant changes because client performance is limited by the router.
3636

37-
### PC throughput peer to peer, udp multicast:
37+
### PC throughput peer to peer, UDP multicast:
3838

3939
{{< figure-inline
4040
src="../../img/20250430-Zenoh-Pico-Performance/zpperf2.png"
4141
class="figure-inline"
4242
alt="Peer throughput" >}}
4343

44-
Peer to peer being not limited by router performance, we observe a bigger improvement on smaller payloads with batching (>20x), but a smaller one (>10x) for fragmented packets (>2kiB) because of UDP's smaller packet size.
44+
Peer to peer being not limited by router performance, we observe a bigger improvement on smaller payloads with batching (>20x), but a smaller one (>10x) for fragmented packets (>2KiB) because of UDP's smaller packet size.
4545

46-
In addition, we observe a 60% throughput increase for the other payload sizes, that results from the many improvements we implemented and that we detail below.
46+
In addition, we observe a 60% throughput increase for the other payload sizes, that results from the general library optimization.
4747

4848
### PC latency:
4949

5050
{{< figure-inline
5151
src="../../img/20250430-Zenoh-Pico-Performance/zpperf3.png"
5252
class="figure-inline"
53-
alt="Peer throughput" >}}
53+
alt="PC latency" >}}
5454

55-
This plot let us see a >50x improvement on fragmented packets latency, again due to data copy optimization, but also a 35% improvement across the board from the general library optimization.
55+
This plot shows a >50x enhancement on fragmented packets latency, again due to data copy improvement, but also a 35% boost across the board from the general library optimization.
5656

57-
Note that a big chunk of the latency value is due the router (node to router hop + time to route the packet + router to node hop), and this value could be much lower using peer to peer tcp unicast.
57+
Note that a big chunk of the latency value is due to the router (node to router hop + time to route the packet + router to node hop), and this value could be much lower using peer to peer TCP unicast.
5858

5959
### Performance limitations/regime:
6060

@@ -67,7 +67,7 @@ Before going into embedded results, let's spend some time in understanding what
6767

6868
For throughput there are 3 distinctive regions:
6969
* Region 1 is limited by network and syscalls, with `send` and `recv` taking more than 90% of the execution time.
70-
* Region 2 is limited by cpu speed / zenoh-pico performance, with tx taking slightly more cpu power than rx.
70+
* Region 2 is limited by CPU speed / Zenoh-Pico performance, with TX taking slightly more CPU power than RX.
7171
* Region 3 is limited by memory bandwidth, with `memcpy` taking more and more of the execution time as payload size grows.
7272

7373
{{< figure-inline
@@ -76,19 +76,19 @@ For throughput there are 3 distinctive regions:
7676
alt="Latency limitations" >}}
7777

7878
For latency there are 2 regions:
79-
* Region 1 is limited by cpu speed / Zenoh-Pico performance.
79+
* Region 1 is limited by CPU speed / Zenoh-Pico performance.
8080
* Region 2 is limited by memory bandwidth, similarly to throughput.
8181

8282
### Embedded throughput:
8383

84-
Embedded systems being limited memory wise, we limited payload sizes to 4kiB maximum which is still enough to observe fragmented packets behavior for 2kiB and 4kiB sizes.
84+
Embedded systems being limited memory-wise, we limited payload sizes to 4KiB maximum which is still enough to observe fragmented packets behavior for 2KiB and 4KiB sizes.
8585

8686
{{< figure-inline
8787
src="../../img/20250430-Zenoh-Pico-Performance/zpperf4.png"
8888
class="figure-inline"
8989
alt="Peer throughput" >}}
9090

91-
The esp32 really benefits from batching with a >50x increase in throughput, which seems fair since we're going through a much slower wifi interface compared to loopback that uses unix pipe.
91+
The ESP32 really benefits from batching with a >50x increase in throughput, which seems fair since we're going through a much slower Wi-Fi interface compared to loopback that uses unix pipe.
9292

9393
### Embedded latency:
9494

@@ -97,26 +97,26 @@ The esp32 really benefits from batching with a >50x increase in throughput, whic
9797
class="figure-inline"
9898
alt="Peer throughput" >}}
9999

100-
Latency values are in the ~10ms range mostly because wifi itself is slow as demonstrated by the ~4ms value observed on Zenoh-Pico PC latency measured on the same wifi network.
100+
Latency values are in the ~10ms range mostly because Wi-Fi itself is slow as demonstrated by the ~4ms value observed on Zenoh-Pico PC latency measured on the same Wi-Fi network.
101101

102-
We do observe a big impact on latency when trying to send fragmented packets which should come from both wifi and esp32 bandwidth limitation.
102+
We do observe a big impact on latency when trying to send fragmented packets which should come from both Wi-Fi and ESP32 bandwidth limitation.
103103

104104
## How performance was improved
105105

106-
To improve Zenoh-Pico performance, we traced it on PC using [samply](https://github.com/mstange/samply) and the Firefox debugger to visualize the traces. That allowed us to detect choke points and part of the code that could be improved.
106+
To improve Zenoh-Pico performance, we traced it on PC using [samply](https://github.com/mstange/samply) and the Firefox debugger to visualize the traces. That allowed us to detect choke points and parts of the code that could be improved.
107107

108-
As stated earlier, the most impactful changes were solving the byte by byte copy issue for fragmented packets and the introduction of the manual batching mechanism.
108+
As stated earlier, the most impactful changes were solving the byte-by-byte copy issue for fragmented packets and the introduction of the manual batching mechanism.
109109

110-
Beside that, we also streamlined a lot how the stack created, used and destroyed data to avoid redundant operations or needless data copies. We also rationalized heap memory usage and fragmentation although these changes were not quantified.
110+
Besides that, we also streamlined a lot how the stack created, used and destroyed data to avoid redundant operations or unnecessary data copies. We also rationalized heap memory usage and fragmentation although these changes were not quantified.
111111

112112
## Manual Batching
113113

114114
If you want to use Zenoh-Pico recently introduced manual batching you only have 3 things to know about:
115-
* `zp_batch_start`: Activate the batching mechanism, any message that would have been sent on the network by a subsequent api call (e.g `z_put`, `z_get`) will be instead stored until either: the batch is full, flushed or batching is stopped
116-
* `zp_batch_stop`: Deactivate the batching mechanism and send the currently batched on the network.
115+
* `zp_batch_start`: Activate the batching mechanism, any message that would have been sent on the network by a subsequent API call (e.g `z_put`, `z_get`) will be instead stored until either the batch is full, flushed or batching is stopped.
116+
* `zp_batch_stop`: Deactivate the batching mechanism and send the currently batched messages on the network.
117117
* `zp_batch_flush`: Send the currently batched messages on the network.
118118

119-
Note that there are also cases where a batch will be sent if a message needs to be sent immediately, like when sending keep alive messages or if the api pushes a message with the `is_express` qos.
119+
Note that there are also cases where a batch will be sent if a message needs to be sent immediately, like when sending keep-alive messages or if the API pushes a message with the `is_express` QOS.
120120

121121
### Examples:
122122

@@ -151,12 +151,12 @@ In this second example, another thread is responsible for sending messages and w
151151
zp_batch_stop(z_loan(session));
152152
```
153153

154-
## Wrapping-up
154+
## Wrapping up
155155

156156
As you saw, we improved throughput and latency across the board, in some cases reaching a 100x increase.
157157

158158
We also introduced manual batching which, beside improving throughput of small messages, can be used to reduce power consumption in embedded devices by reducing network transmissions.
159159

160-
Now let's talk briefly of our next big feature. As we hinted above, we are limited in client by the router both in throughput and latency, but client mode is currently the only way to use tcp links in Zenoh-Pico...
160+
Now let's talk briefly of our next big feature. As we hinted above, we are limited in client mode by the router both in throughput and latency, but client mode is currently the only way to use TCP links in Zenoh-Pico...
161161

162-
That was true until the newly introduced peer to peer unicast mode that we will present in a future blogpost!
162+
That was true until the newly introduced peer-to-peer unicast mode that we will present in a future blog post!

0 commit comments

Comments
 (0)