-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Client/Service Deadlock #484
Comments
A diagram for further analysis: sequenceDiagram
participant Service as Thread 1 (Service)
participant mutex_s
participant mutex_c
participant Client as Thread 2 (Client)
Client->>mutex_c: send_request (lock)
Client->>mutex_s: add_new_query (lock)
mutex_s->>Client: add_new_query (release)
mutex_c->>Client: send_request (release)
Service->>mutex_s: take_request (lock)
mutex_s->>Service: take_request (release)
Client->>mutex_c: send_request (lock)
Service->>mutex_s: send_response (lock)
Client->>mutex_s: add_new_query (waiting)
Service->>mutex_c: query.reply (waiting)
|
To clarify the deadlock issue and the relationship between session-local mechanism in zenoh, I try to plot the whole mechanism as below. sequenceDiagram
participant mutex_s as mutex_s
participant T2 as Service API
actor P2 as Background Thread
participant Z as Zenoh <br> Session
participant mutex_c as mutex_c
participant T1 as Client API
actor P1 as Main Thread
P1 ->> T1: send_request
T1 ->> mutex_c: acquire
mutex_c ->> T1: lock
activate mutex_c
T1 ->> Z:
Z ->> Z: z_get
Z ->> T2: add_new_query
T2 ->> mutex_s: acquire
mutex_s ->> T2: lock
activate mutex_s
T2 ->> P2: notify
T2 ->> mutex_s: release
deactivate mutex_s
T2 ->> Z:
Z ->> T1:
T1 ->> mutex_c: release
deactivate mutex_c
P2 ->> T2: take_request
P1 ->> T1: send_request
T2 ->> mutex_s: acquire
mutex_s ->> T2: lock
activate mutex_s
T2 ->> P2: queue request
T2 ->> mutex_s: release
deactivate mutex_s
T1 ->> mutex_c: acquire
mutex_c ->> T1: lock
activate mutex_c
T1 ->> Z:
P2 ->> T2: send_response
Z ->> Z: z_get
T2 ->> mutex_s: acquire
mutex_s ->> T2: lock
activate mutex_s
Z ->> T2: add_new_query
T2 -x mutex_s: acquire
T2 ->> Z: query.reply
Z ->> T1: add_new_reply
T1 -x mutex_c: acquire
deactivate mutex_c
deactivate mutex_s
|
A proposed fix be like. diff --git a/rmw_zenoh_cpp/src/detail/rmw_client_data.cpp b/rmw_zenoh_cpp/src/detail/rmw_client_data.cpp
index 6446c71..889065d 100644
--- a/rmw_zenoh_cpp/src/detail/rmw_client_data.cpp
+++ b/rmw_zenoh_cpp/src/detail/rmw_client_data.cpp
@@ -394,6 +394,7 @@ rmw_ret_t ClientData::send_request(
std::weak_ptr<rmw_zenoh_cpp::ClientData> client_data = shared_from_this();
zenoh::ZResult result;
std::string parameters;
+ mutex_.unlock();
context_impl->session()->get(
keyexpr_.value(),
parameters, |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
After investigating the test failure nav2_behavior_tree/test/plugins/action/test_bt_action_node in navigation2, we found that a deadlock could occur in the following way.
The add_new_query operation must occur before or after send_response.
The text was updated successfully, but these errors were encountered: