Skip to content

p2p robustness and scalability#1222

Draft
gmartin82 wants to merge 3 commits into
eclipse-zenoh:mainfrom
gmartin82:ZEN-937
Draft

p2p robustness and scalability#1222
gmartin82 wants to merge 3 commits into
eclipse-zenoh:mainfrom
gmartin82:ZEN-937

Conversation

@gmartin82
Copy link
Copy Markdown
Contributor

@gmartin82 gmartin82 commented May 14, 2026

Description

Work in progress toward consistently starting interconnected peer zenoh-pico nodes at larger scales.

What does this PR do?

  • Make the listen connection limit configurable via CMake
  • Add connect timeout support to the C11 z_pub / z_sub examples
  • Update the peer mesh run script to use configurable connect timeout
  • Speed up run script log scanning and report time to full observed connectivity

Why is this change needed?

The peer-to-peer stress script needs to exercise larger interconnected zenoh-pico meshes, ideally up to 50 nodes, without relying on hardcoded connection limits or slow log parsing. This PR starts improving that workflow by making the listen limit configurable, allowing examples to keep retrying peer connects, and making the script report the actual time until full observed connectivity.

Related Issues


🏷️ Label-Based Checklist

Based on the labels applied to this PR, please complete these additional requirements:

Labels: internal

🏠 Internal Change

This PR is marked as internal (not user-facing):

  • No API changes - Public APIs unchanged
  • No behavior changes - External behavior identical
  • Refactoring/maintenance - Code improvements only
  • Tests still pass - All existing tests pass without modification

Lighter review: Internal changes may have lighter review requirements.

Instructions:

  1. Check off items as you complete them (change - [ ] to - [x])
  2. The PR checklist CI will verify these are completed

This checklist updates automatically when labels change, but preserves your checked boxes.

Work in progress toward consistently starting interconnected peer
zenoh-pico nodes at larger scales.

- Make the listen connection limit configurable via CMake
- Add connect timeout support to the C11 z_pub/z_sub examples
- Update the peer mesh run script to use configurable connect timeout
- Speed up run script log scanning and report time to full observed connectivity
@gmartin82 gmartin82 added the internal Changes not included in the changelog label May 14, 2026
Comment thread run.sh
@@ -0,0 +1,79 @@
nb=${1:-5}
Comment thread run.sh
connect_timeout=${3:--1}

echo "" > log
for i in $(seq -f "%03g" 1 1 $nb 2>/dev/null)
Comment thread run.sh
do
connect="$connect -e tcp/127.0.0.1:8$j -e tcp/127.0.0.1:9$j"
done
stdbuf -o0 ./build/examples/z_pub -m peer -l tcp/127.0.0.1:8$i -t $connect_timeout $connect -k demo/example/$i | while read line; do echo "[pub $i][$(date +%s.%N)] $line"; done >> log &
Comment thread run.sh
do
connect="$connect -e tcp/127.0.0.1:8$j -e tcp/127.0.0.1:9$j"
done
stdbuf -o0 ./build/examples/z_pub -m peer -l tcp/127.0.0.1:8$i -t $connect_timeout $connect -k demo/example/$i | while read line; do echo "[pub $i][$(date +%s.%N)] $line"; done >> log &
Comment thread run.sh
connect="$connect -e tcp/127.0.0.1:8$j -e tcp/127.0.0.1:9$j"
done
stdbuf -o0 ./build/examples/z_pub -m peer -l tcp/127.0.0.1:8$i -t $connect_timeout $connect -k demo/example/$i | while read line; do echo "[pub $i][$(date +%s.%N)] $line"; done >> log &
stdbuf -o0 ./build/examples/z_sub -m peer -l tcp/127.0.0.1:9$i -t $connect_timeout -e tcp/127.0.0.1:8$i $connect | while read line; do echo "[sub $i][$(date +%s.%N)] $line"; done >> log &
Comment thread run.sh
connect="$connect -e tcp/127.0.0.1:8$j -e tcp/127.0.0.1:9$j"
done
stdbuf -o0 ./build/examples/z_pub -m peer -l tcp/127.0.0.1:8$i -t $connect_timeout $connect -k demo/example/$i | while read line; do echo "[pub $i][$(date +%s.%N)] $line"; done >> log &
stdbuf -o0 ./build/examples/z_sub -m peer -l tcp/127.0.0.1:9$i -t $connect_timeout -e tcp/127.0.0.1:8$i $connect | while read line; do echo "[sub $i][$(date +%s.%N)] $line"; done >> log &
Comment thread run.sh
stdbuf -o0 ./build/examples/z_sub -m peer -l tcp/127.0.0.1:9$i -t $connect_timeout -e tcp/127.0.0.1:8$i $connect | while read line; do echo "[sub $i][$(date +%s.%N)] $line"; done >> log &
done

sleep $duration
Comment thread run.sh

if [[ $stop != "" ]]
then
echo OK $(($stop - $start)) seconds
Comment thread run.sh
then
echo OK $(($stop - $start)) seconds
else
echo KO $failure
Comment thread examples/unix/c11/z_pub.c
bool *add_matching_listener) {
int opt;
while ((opt = getopt(argc, argv, "k:v:e:m:l:n:a")) != -1) {
while ((opt = getopt(argc, argv, "k:v:e:m:l:n:at:")) != -1) {
Comment thread examples/unix/c11/z_sub.c
static int parse_args(int argc, char **argv, z_owned_config_t *config, char **ke, int *n) {
int opt;
while ((opt = getopt(argc, argv, "k:e:m:l:n:")) != -1) {
while ((opt = getopt(argc, argv, "k:e:m:l:n:t:")) != -1) {
Comment thread examples/unix/c11/z_pub.c
bool *add_matching_listener) {
int opt;
while ((opt = getopt(argc, argv, "k:v:e:m:l:n:a")) != -1) {
while ((opt = getopt(argc, argv, "k:v:e:m:l:n:at:")) != -1) {
Comment thread examples/unix/c11/z_pub.c
bool *add_matching_listener) {
int opt;
while ((opt = getopt(argc, argv, "k:v:e:m:l:n:a")) != -1) {
while ((opt = getopt(argc, argv, "k:v:e:m:l:n:at:")) != -1) {
Comment thread examples/unix/c11/z_pub.c
break;
case 't':
#if defined(Z_FEATURE_UNSTABLE_API)
zp_config_insert(z_loan_mut(*config), Z_CONFIG_CONNECT_TIMEOUT_KEY, optarg);
Comment thread examples/unix/c11/z_pub.c
case '?':
if (optopt == 'k' || optopt == 'v' || optopt == 'e' || optopt == 'm' || optopt == 'l' ||
optopt == 'n') {
optopt == 'n' || optopt == 't') {
Comment thread examples/unix/c11/z_sub.c
static int parse_args(int argc, char **argv, z_owned_config_t *config, char **ke, int *n) {
int opt;
while ((opt = getopt(argc, argv, "k:e:m:l:n:")) != -1) {
while ((opt = getopt(argc, argv, "k:e:m:l:n:t:")) != -1) {
Comment thread examples/unix/c11/z_sub.c
static int parse_args(int argc, char **argv, z_owned_config_t *config, char **ke, int *n) {
int opt;
while ((opt = getopt(argc, argv, "k:e:m:l:n:")) != -1) {
while ((opt = getopt(argc, argv, "k:e:m:l:n:t:")) != -1) {
Comment thread examples/unix/c11/z_sub.c
break;
case 't':
#if defined(Z_FEATURE_UNSTABLE_API)
zp_config_insert(z_loan_mut(*config), Z_CONFIG_CONNECT_TIMEOUT_KEY, optarg);
Comment thread examples/unix/c11/z_sub.c
break;
case '?':
if (optopt == 'k' || optopt == 'e' || optopt == 'm' || optopt == 'l' || optopt == 'n') {
if (optopt == 'k' || optopt == 'e' || optopt == 'm' || optopt == 'l' || optopt == 'n' ||
Comment thread examples/unix/c11/z_sub.c
case '?':
if (optopt == 'k' || optopt == 'e' || optopt == 'm' || optopt == 'l' || optopt == 'n') {
if (optopt == 'k' || optopt == 'e' || optopt == 'm' || optopt == 'l' || optopt == 'n' ||
optopt == 't') {
Comment thread src/transport/manager.c
const _z_config_t *config) {
if (!_z_pending_peers_has_pending(pending_peers)) {
const _z_config_t *config, size_t max_attempts) {
size_t pending_count = _z_pending_peers_count_pending(pending_peers);
Comment thread src/transport/manager.c

for (size_t attempt = 0; attempt < loop_count; attempt++) {
size_t i = 0;
if (!_z_pending_peers_next_pending_idx(pending_peers, &i)) {
Comment thread src/transport/manager.c Fixed
Comment thread src/transport/manager.c Outdated
gmartin82 added 2 commits May 15, 2026 11:25
In single-thread mode, spin the async-opened session from only one test thread while waiting for peer discovery, avoiding concurrent zp_spin_once() calls on the same session.
Limit the background add-peers task to a small number of peer attempts per executor tick, rotating through pending locators before applying backoff. This prevents peer retry work from monopolizing single-thread runtimes while still retrying the full pending set before sleeping.
Comment thread src/transport/manager.c
}

#if Z_FEATURE_UNICAST_PEER == 1
#define _Z_ADD_PEERS_ALL_PENDING 0
Comment thread src/transport/manager.c

#if Z_FEATURE_UNICAST_PEER == 1
#define _Z_ADD_PEERS_ALL_PENDING 0
#define _Z_ADD_PEERS_TASK_MAX_ATTEMPTS 1
Comment thread src/transport/manager.c
// Non-retryable error
bool has_pending = _z_pending_peers_has_pending(pending_peers);

if (result._last_non_retryable_ret != _Z_RES_OK) {
Comment thread src/transport/manager.c

if (result._last_non_retryable_ret != _Z_RES_OK) {
// Non-retryable error.
if (exit_on_failure) {
Comment thread src/transport/manager.c
_z_pending_peers_clear(pending_peers);
return _Z_RES_OK;
}

Comment thread src/transport/manager.c
if (pending_peers->_remaining_attempts == 0) {
pending_peers->_remaining_attempts = _z_pending_peers_count_pending(pending_peers);
}

Comment thread src/transport/manager.c
if (pending_peers->_remaining_attempts > 0) {
return _z_fut_fn_result_continue();
}

@gmartin82
Copy link
Copy Markdown
Contributor Author

gmartin82 commented May 15, 2026

Investigation update: 50-node peer mesh startup

Current testing still shows that a 50-process peer mesh does not reliably converge (25 publishers + 25 subscribers).

Latest local run:

./run.sh 25 300

KO Sub 8 didn't receive from Pub 17

ZP_PEER_ADD_BACKGROUND: 0
ZP_PEER_ESTABLISHED:    2450
ZP_PEER_EXPIRED:        603
accept handshake failed: 229

first peer add:     +1.0s
last peer add:      +138.4s
first peer expiry:  +20.2s
last peer expiry:   +139.7s
pre-kill marker:    +300.2s

This suggests the mesh is not simply failing to attempt peer connections. Connections are being established, but peers begin expiring while peer addition is still ongoing. In this run, lease expiry starts around +20s, while peer establishment continues until around +138s.

The key implication is that peer connection establishment and lease/keepalive management need to be aligned. As soon as a peer connection is established, the transport must have the lease/keepalive machinery in place to service it. Long synchronous peer-add work during startup can leave already-established peers live before the background tasks that maintain their leases are able to run.

Older behavior around zenoh-pico 1.9.0 was also checked. The same broad design existed there: peer lease expiry removes peers from the peer transport, but expired configured peers are not requeued as desired peer connections. So this appears to be an existing limitation rather than a regression introduced by the recent retry work.

Current understanding

  • The add-peers path can contribute startup pressure, especially on single-thread or constrained executors.
  • The deeper issue appears to be peer lease handling/recovery: established configured peers can expire and be removed without being restored.
  • Attempts to simply background peer addition earlier reduced startup blocking, but also introduced/confirmed connection storm and handshake pressure. That alone is not a complete fix.
  • Peer keepalive likely needs to be handled on a per-peer basis. A naive transport-wide keepalive experiment reduced expiries, but was backed out because it did not properly model individual peer lease state.

Potential next steps

  • Ensure lease/keepalive handling is active as soon as peer connections are established.
  • Re-establish expired peer connections, so peer lease expiry does not permanently remove a configured peer from the mesh.
  • Investigate accept-side handshake failures (_Z_ERR_TRANSPORT_RX_FAILED, -99) under connection storms.
  • Consider a multi-threaded executor enhancement, such as a thread-pool style runtime, so platforms that support threads can parallelize peer-add, lease, read, and keepalive work. The implementation still needs to remain cooperative and bounded for single-threaded environments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

internal Changes not included in the changelog

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants