Merge pull request #101 from jean-roland/ft_zpperf_post

kydos · web-flow · commit 5493b1b2ae6b · 2025-05-06T20:07:56.000+02:00
Fix zenoh-pico performance post typos
diff --git a/content/blog/2025-04-09-Zenoh-Pico-Performance.md b/content/blog/2025-04-09-Zenoh-Pico-Performance.md
@@ -9,7 +9,7 @@ draft: false
 
 # Improving Zenoh-Pico performances
 
-Last year, after the long awaited release of Zenoh 1.0 which included a unified C API with Zenoh-C and Zenoh-Pico, we decided to dedicate some time to measure and improve the performances & efficiency of Zenoh-Pico. These modifications were released with Zenoh 1.1 earlier this year and we're presenting you the results with this blog post.
+Last year, after the long awaited release of Zenoh 1.0 which included a unified C API with Zenoh-C and Zenoh-Pico, we decided to dedicate some time to measure and improve the performance & efficiency of Zenoh-Pico. These modifications were released with Zenoh 1.1 earlier this year and we're presenting you the results with this blog post.
 
 ## What is Zenoh-Pico?
 
@@ -19,7 +19,7 @@ Zenoh-Pico is the lightweight, native C implementation of the[ Eclipse Zenoh](ht
 
 To measure performance, we have a standardized throughput test and a latency test which we run on a standardized machine (Intel Xeon E3-1275 @3.6GHzn 32GB DDR4, Ubuntu 22.04). For embedded measurements, we ran those tests on an esp32-wroom32 dev board. 
 
-These tests produce a thousand measurements or so per payload size that we use to calculate the median value to then get the following graphs (not that the y axis is log scale):
+These tests produce a thousand measurements or so per payload size that we use to calculate the median value to then get the following graphs (note that the y axis is log scale):
 
 ### PC throughput client, tcp:
 
@@ -28,7 +28,7 @@ These tests produce a thousand measurements or so per payload size that we use t
     class="figure-inline"
     alt="Client throughput" >}}
 
-We see a massive (>50x) improvement in throughput for payloads over 32kiB, this is because packets of these sizes are fragmented on the network and we had an issue where their data was serialized byte per byte.
+We see a massive (up to 100x) improvement in throughput for payloads over 32kiB, this is because packets of these sizes are fragmented on the network and we had an issue where their data was serialized byte per byte.
 
 We also see a >10x improvement in throughput for smaller payloads when using manual batching (more info below) introduced in 1.1 as well. 
 
@@ -41,7 +41,7 @@ Other than that there are no significant changes because client performance is l
     class="figure-inline"
     alt="Peer throughput" >}}
 
-Peer to peer being not limited by router performance, we observe even bigger improvements on smaller payloads with batching (>20x) and on fragmented packets (>100x), which starts at 2kiB on UDP. 
+Peer to peer being not limited by router performance, we observe a bigger improvement on smaller payloads with batching (>20x), but a smaller one (>10x) for fragmented packets (>2kiB) because of UDP's smaller packet size. 
 
 In addition, we observe a 60% throughput increase for the other payload sizes, that results from the many improvements we implemented and that we detail below.
 
@@ -103,9 +103,9 @@ We do observe a big impact on latency when trying to send fragmented packets whi
 
 ## How performance was improved
 
-To improve Zenoh-Pico performance, we traced it on PC using (samply)[https://github.com/mstange/samply] and the Firefox debugger to visualize the traces. That allowed us to detect choke points and part of the code that could be improved.
+To improve Zenoh-Pico performance, we traced it on PC using [samply](https://github.com/mstange/samply) and the Firefox debugger to visualize the traces. That allowed us to detect choke points and part of the code that could be improved.
 
-As stated earlier, the changes that had the biggest by far is solving the byte by byte copy issue for fragmented packets and the introduction of the manual batching mechanism.
+As stated earlier, the most impactful changes were solving the byte by byte copy issue for fragmented packets and the introduction of the manual batching mechanism.
 
 Beside that, we also streamlined a lot how the stack created, used and destroyed data to avoid redundant operations or needless data copies. We also rationalized heap memory usage and fragmentation although these changes were not quantified.