|
| 1 | +--- |
| 2 | +title: "Zenoh-Pico peer to peer unicast mode" |
| 3 | +date: 2025-01-11 |
| 4 | +menu: "blog" |
| 5 | +weight: 20250630 |
| 6 | +description: "July 11th, 2025 -- Paris." |
| 7 | +draft: false |
| 8 | +--- |
| 9 | + |
| 10 | +# Introduction |
| 11 | + |
| 12 | +As hinted at in our blog post about Zenoh-Pico performance improvements, we’ve now introduced a long-requested peer-to-peer unicast mode for Zenoh-Pico! Let's dive into how it works. |
| 13 | + |
| 14 | +## What is Zenoh-Pico? |
| 15 | + |
| 16 | +Zenoh-Pico is the lightweight, native C implementation of the [Eclipse Zenoh](http://zenoh.io) protocol, designed specifically for constrained devices. It provides a streamlined, low-resource API while supporting all abstractions from [Rust Zenoh](https://github.com/eclipse-zenoh/zenoh): pub, sub and query. Zenoh-Pico already supports a broad range of platforms and protocols, making it a versatile choice for embedded systems development. |
| 17 | + |
| 18 | +# Peer-to-Peer Unicast |
| 19 | + |
| 20 | +Until now, if you didn’t want to run a router with Zenoh-Pico nodes, you had to rely on multicast transport—an option that isn’t always feasible. Additionally, this method was limited to UDP, which lacks reliability. |
| 21 | + |
| 22 | +Now, you can use TCP links to enable unicast peer-to-peer communication and enhance reliability in scenarios without a router. This advancement also improves throughput and latency, which we’ll discuss below. |
| 23 | + |
| 24 | +This feature is supported and has been tested on all platforms, including FreeRTOS, ESP-IDF, Raspberry Pi Pico, Zephyr, Linux, and Windows. It is currently limited to TCP links, but may be extended to include UDP or Serial if there’s demand. |
| 25 | + |
| 26 | +Architecture-wise, we use non-blocking sockets and I/O multiplexing to handle all connections on a single RX thread, plus an additional thread that listens on a socket and accepts incoming connections. For resource-efficiency reasons, peer-unicast nodes do not route traffic: every message received from a connected peer triggers our API, and every message created via our API is sent to all connected peers. This design allows for a single TX and a single RX buffer. |
| 27 | + |
| 28 | +## Examples: |
| 29 | + |
| 30 | +Here is an example showing how to implement a 1:N (or N:1) communication graph: |
| 31 | + |
| 32 | +{{< figure-inline |
| 33 | + src="../../img/20250630-Zenoh-Pico-peer-to-peer-unicast/1-n.png" |
| 34 | + class="figure-inline" |
| 35 | + alt="1:N diagram" >}} |
| 36 | + |
| 37 | +If we assume a single publisher connected to 3 subscribers, here’s how we could configure it: |
| 38 | + |
| 39 | + ```Bash |
| 40 | + ./build/example/z_pub -l tcp/127.0.0.1:7447 |
| 41 | + ./build/example/z_sub -e tcp/127.0.0.1:7447 |
| 42 | + ./build/example/z_sub -e tcp/127.0.0.1:7447 |
| 43 | + ./build/example/z_sub -e tcp/127.0.0.1:7447 |
| 44 | + ``` |
| 45 | + |
| 46 | +To implement an N:N graph: |
| 47 | + |
| 48 | +{{< figure-inline |
| 49 | + src="../../img/20250630-Zenoh-Pico-peer-to-peer-unicast/n-n.png" |
| 50 | + class="figure-inline" |
| 51 | + alt="N:N diagram" >}} |
| 52 | + |
| 53 | +```Bash |
| 54 | +./build/example/z_pub -l tcp/127.0.0.1:7447 |
| 55 | +./build/example/z_sub -l tcp/127.0.0.1:7448 -e tcp/127.0.0.1:7447 |
| 56 | +./build/example/z_sub -l tcp/127.0.0.1:7449 -e tcp/127.0.0.1:7447 -e tcp/127.0.0.1:7448 |
| 57 | +./build/example/z_sub -e tcp/127.0.0.1:7447 -e tcp/127.0.0.1:7448 -e tcp/127.0.0.1:7449 |
| 58 | +``` |
| 59 | + |
| 60 | +# Performances |
| 61 | + |
| 62 | +## Test Details |
| 63 | + |
| 64 | +In addition to enabling peer-to-peer unicast, we improved general library CPU utilization, further boosting throughput and latency by approximately 10%. The tests were run on an Ubuntu 22.04 laptop equipped with an AMD Ryzen 7735U and 32 GB of RAM. |
| 65 | + |
| 66 | +## Configuration |
| 67 | + |
| 68 | +Note that the Zenoh-Pico configuration used for testing deviates from the default. Here are the changes: |
| 69 | + |
| 70 | +* `Z_FEATURE_SESSION_CHECK` set to 0 (default 1): Skips the publisher’s session reference upgrade. This is risky if you use the publisher after closing the session. |
| 71 | +* `Z_FEATURE_BATCH_TX_MUTEX` set to 1 (default 0): Allows the batching mechanism to hold the mutex, which can prevent the lease task from sending keep-alives, triggering connection closure. |
| 72 | +* `Z_FEATURE_RX_CACHE` set to 1 (default 0): Activates the RX LRU cache. It consumes some memory to store results of key expressions that trigger callbacks—useful in repetitive, high-throughput scenarios. |
| 73 | + |
| 74 | +## Results |
| 75 | + |
| 76 | +{{< figure-inline |
| 77 | + src="../../img/20250630-Zenoh-Pico-peer-to-peer-unicast/perf_lat.png" |
| 78 | + class="figure-inline" |
| 79 | + alt="P2p latency" >}} |
| 80 | + |
| 81 | +The round-trip time for packets below 16 KiB is under 20 µs—meaning a one-way latency of under 10 µs. Peer-to-peer unicast delivers up to **70% lower latency** compared to client mode. |
| 82 | + |
| 83 | +{{< figure-inline |
| 84 | + src="../../img/20250630-Zenoh-Pico-peer-to-peer-unicast/perf_thr.png" |
| 85 | + class="figure-inline" |
| 86 | + alt="P2p throughput" >}} |
| 87 | + |
| 88 | +With up to 20 million messages per second for 8-byte messages, peer-to-peer unicast achieves over **4x the throughput** of client mode for small payloads, and still improves performance by **30% for larger payloads**. |
| 89 | + |
| 90 | +# Multicast Declarations |
| 91 | + |
| 92 | +Alongside peer-to-peer unicast, we’ve implemented a multicast declaration feature. This allows multicast transport to: |
| 93 | + |
| 94 | +* Use declared key expression, reducing bandwidth usage and improving throughput by up to **30%** for small payloads. |
| 95 | +* Implement write filtering, where publishers wait for at least one subscriber before sending messages. |
| 96 | + |
| 97 | +This feature is disabled by default and can be enabled by setting `Z_FEATURE_MULTICAST_DECLARATIONS`to 1. It's off by default because, for it to work correctly, all existing nodes must redeclare all key expressions and subscriptions whenever a new node joins the network—which can lead to congestion. |
| 98 | + |
| 99 | +# Memory Allocation Improvements |
| 100 | + |
| 101 | +Previously, we discussed reducing dynamic memory allocations without providing measurements. We've now addressed this by measuring allocations using [heaptrack](https://github.com/KDE/heaptrack). Below are the results from the client throughput test in 1.0: |
| 102 | + |
| 103 | +{{< figure-inline |
| 104 | + src="../../img/20250630-Zenoh-Pico-peer-to-peer-unicast/malloc_1_0.png" |
| 105 | + class="figure-inline" |
| 106 | + alt="1.0 heaptrack" >}} |
| 107 | + |
| 108 | +And here are the results for the current version: |
| 109 | + |
| 110 | +{{< figure-inline |
| 111 | + src="../../img/20250630-Zenoh-Pico-peer-to-peer-unicast/malloc_current.png" |
| 112 | + class="figure-inline" |
| 113 | + alt="current heaptrack" >}} |
| 114 | + |
| 115 | +## Memory Breakdown: |
| 116 | + |
| 117 | +Version 1.0: |
| 118 | +* Handled 5.8 million messages in 20 seconds (~290k messages/sec) |
| 119 | +* Peak memory usage: 1.15 MB |
| 120 | +* 64 million allocations, 11 allocations per message |
| 121 | +* 600 kB: Two eagerly allocated 300 kB defragmentation buffers |
| 122 | +* 100 kB: TX and RX buffers (50 kB each) |
| 123 | + |
| 124 | +Current version: |
| 125 | +* Handled 84.4 million messages in 20 seconds (~4.2M messages/sec) — 15x throughput increase |
| 126 | +* Peak memory usage: 101 kB — 91% less memory |
| 127 | +* 118 allocations total, no per-message allocations thanks to ownership transfers and the RX cache — 99.9998% fewer allocations |
| 128 | +* Defragmentation buffers now allocated on demand (not needed in this test) |
| 129 | +* 100 kB: TX and RX buffers (50 kB each) |
| 130 | +* ~50% of remaining allocations are from UDP scouting (can be eliminated with a direct router endpoint) |
| 131 | + |
| 132 | +Since this test involved a single subscriber, no message copies were needed. With multiple subscribers, data copies would be required—but only for auxiliary data (like key expressions), as payloads are reference-counted. |
| 133 | + |
| 134 | +# Final Thoughts |
| 135 | + |
| 136 | +This release brings substantial improvements to Zenoh-Pico's flexibility and performance. Peer-to-peer unicast opens the door to more robust, scalable topologies without requiring a central router. And the combined enhancements in memory use, throughput, and latency make it a strong choice for high-performance embedded applications. |
0 commit comments