Skip to content

Service call reports z_reply_is_ok returned False Reason: Timeout #783

@mini-1235

Description

@mini-1235

Generated by Generative AI

No

Operating System:

Linux XPS-15-9520 6.8.0-78-generic #78-Ubuntu SMP PREEMPT_DYNAMIC Tue Aug 12 11:34:18 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

ROS version or commit hash:

Rolling binary http://packages.ros.org/ros2/ubuntu noble/main amd64 ros-rolling-rmw-zenoh-cpp amd64 0.9.0-1noble.20250822.215306

RMW implementation (if applicable):

zenoh

RMW Configuration (if applicable):

No response

Client library (if applicable):

No response

'ros2 doctor --report' output

ros2 doctor --report
<COPY OUTPUT HERE>

Steps to reproduce issue

  1. run nav2 with
ros2 launch nav2_bringup tb3_simulation_launch.py headless:=0
  1. wait until initial transform timeout (around 1 minute) and see the error from zenoh

Expected behavior

Receive a response of false from the change_state service server, and nav2 abort bringup

[ERROR 1757400210.592444579] [global_costmap.global_costmap]: Failed to activate global_costmap because transform from base_link to map did not become available before timeout (on_activate() at /root/nav2_ws/src/navigation2/nav2_costmap_2d/src/costmap_2d_ros.cpp:304)
[component_container_isolated-7] [ERROR 1757400210.593177984] [lifecycle_manager_navigation]: Failed to change state for node: planner_server (changeStateForNode() at /root/nav2_ws/src/navigation2/nav2_lifecycle_manager/src/lifecycle_manager.cpp:268)
[component_container_isolated-7] [ERROR 1757400210.593291041] [lifecycle_manager_navigation]: Failed to bring up all requested nodes. Aborting bringup. (startup() at /root/nav2_ws/src/navigation2/nav2_lifecycle_manager/src/lifecycle_manager.cpp:335)

Actual behavior

The change_state service does not return false, instead I see a couple of errors from the zenoh side

[component_container_isolated-8] [ERROR 1757390757.072415008] [rmw_zenoh_cpp]: z_reply_is_ok returned False Reason: Timeout (add_new_query() at ./src/detail/rmw_service_data.cpp:252)
[component_container_isolated-8] 2025-09-09T04:05:58.074188Z  WARN ThreadId(12) zenoh::api::session: Received ReplyData for unknown Query: 77
[component_container_isolated-8] 2025-09-09T04:05:58.074224Z  WARN ThreadId(12) zenoh::api::session: Received ResponseFinal for unknown Request: 77
[component_container_isolated-8] [ERROR 1757390758.431290576] [rmw_zenoh_cpp]: Query queue depth of 10 reached, discarding oldest Query for service for 0/lifecycle_manager_navigation/is_active/std_srvs::srv::dds_::Trigger_/RIHS01_eeff2cd6fa5ad9d27cdf4dec64818317839b62f212a91e6b5304b634b2062c5f (add_new_query() at ./src/detail/rmw_service_data.cpp:252)

I believe it hangs over there and never receive the result from the service server side

Additional information

I was trying to setup rmw_zenoh for Nav2 in this PR ros-navigation/navigation2#5516, one of the system tests is failing here https://app.circleci.com/pipelines/github/ros-navigation/navigation2/16036/workflows/7e2c6499-4b74-4e47-9c88-cf5953de2fd1/jobs/46834. Note that this test passes on fastrtps / cyclonedds, and only fails on zenoh.

I will illustrate the working flow here:

  1. After we launch the tb3 simulation, we are changing the states for the lifecycle nodes, the planner server, when calling on_activate, https://github.com/ros-navigation/navigation2/blob/ab4aaf863aea6a7d1e67e0c2ad536ec1f0c8ccf7/nav2_planner/src/planner_server.cpp#L180 will activate global costmap. However, for global costmap to be activated we typically need a transform from map -> odom -> robot_base_frame, which is unavailable unless we provide an initial pose.
  2. Suppose we never provide the initial pose, then costmap ros should fail to activate when it reaches initial transform timeout, https://github.com/ros-navigation/navigation2/blob/ab4aaf863aea6a7d1e67e0c2ad536ec1f0c8ccf7/nav2_costmap_2d/src/costmap_2d_ros.cpp#L303-L310, planner will also return failure here https://github.com/ros-navigation/navigation2/blob/ab4aaf863aea6a7d1e67e0c2ad536ec1f0c8ccf7/nav2_planner/src/planner_server.cpp#L181-L183 and fail to change state
  3. The lifecycle manager, who was trying to change planner server's state earlier https://github.com/ros-navigation/navigation2/blob/ab4aaf863aea6a7d1e67e0c2ad536ec1f0c8ccf7/nav2_lifecycle_manager/src/lifecycle_manager.cpp#L264-L269, should say "failed to change node for planner server" after
!node_map_[node_name]->change_state(transition, std::chrono::milliseconds(-1),
      service_timeout_

receiving the response(false) from service server. However, it seems that we are unable to get any response from the service server when running with rmw zenoh

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions