Skip to content

[Bug] zenoh-bridge-ros2dds TF route drops when multiple DDS participants join simultaneously #690

@ahirsurbhi17

Description

@ahirsurbhi17

Describe the bug

Environment:

zenoh-bridge-ros2dds version: 1.5.0 (laptop) / 1.8.0 (robot)
ROS 2 Jazzy
OS: Ubuntu 24.04 (robot Docker) / WSL2 Ubuntu (laptop)
Network: WiFi (same subnet)

Setup:

Robot runs zenoh-bridge-ros2dds in peer mode, listening on tcp/0.0.0.0:7447
Laptop runs zenoh-bridge-ros2dds connecting to robot via tcp/192.168.0.132:7447 --no-multicast-scouting
Robot namespace: /sr1

Problem:
When multiple ROS 2 nodes join the DDS network simultaneously (in our case 13 RMF nodes starting together via a launch file), the /sr1/tf topic route on the laptop side stops forwarding data and never recovers without restarting the bridge.

Observed behavior:

/sr1/tf has Publisher count: 1 (zenoh_bridge_ros2dds) but zero messages flow
Robot side /tf continues publishing at 200Hz normally
Both Zenoh bridge processes remain alive — no crash
Bridge logs show nothing unusual at the moment TF drops
Only fix is restarting the laptop Zenoh bridge
Issue occurs regardless of --queries-timeout-default value (tested 0.5, 1.0, 3.0, 5.0)
Issue occurs regardless of --watchdog flag

Expected behavior:

/sr1/tf should continue forwarding even when new DDS participants join
Route should be resilient to DDS participant churn

Root cause hypothesis:
When 13 DDS participants join simultaneously, Zenoh triggers route renegotiation for all transient local topics. During this renegotiation, the /tf route appears to get dropped and never re-established, possibly due to a race condition in route management.
Workaround:
Starting the Zenoh bridge after all ROS 2 nodes are already running reduces the frequency of the issue but does not eliminate it — TF still drops after some time even without new participants joining.
Questions:

Is there a way to mark specific routes (like /tf) as persistent/non-droppable during renegotiation?
Is there a configuration option to prevent route drops during DDS participant churn?
Is this a known issue with peer mode vs router/client mode?

To reproduce

Start zenoh-bridge-ros2dds on robot (peer/listener mode)
Start zenoh-bridge-ros2dds on laptop (connect to robot)
Verify /sr1/tf is echoing correctly on laptop
Start 13+ ROS 2 nodes simultaneously on laptop (e.g. RMF core launch file)
/sr1/tf immediately stops echoing and never recovers

System info

zenoh-bridge-ros2dds version: 1.5.0 (laptop) / 1.8.0 (robot)
ROS 2 Jazzy
OS: Ubuntu 24.04 (robot Docker) / WSL2 Ubuntu (laptop)
Network: WiFi (same subnet)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions