Describe the bug
Description
In rare cases, an outgoing Publish call never completes and the application appears to be waiting “forever” for the publish to be acknowledged/confirmed. This is most visible when publishing via a long-lived running endpoint instance (static IEndpointInstance / IMessageSession) rather than via the IMessageHandlerContext passed into a handler.
This looks like the transport can enter a state where it is waiting on a publisher confirm (or a confirmation-tracking task) that never completes, with no transport-level timeout to fail fast and recycle the channel.
Suspected root cause
- RabbitMQ transport uses a confirmation-enabled channel with confirmation tracking enabled.
- An outgoing publish operation awaits a task that is expected to complete when the broker confirms the publish.
- In rare cases (e.g. packet loss, broker edge case), that confirmation completion may never happen even though the TCP connection remains open.
- Because there is no bounded timeout, the Task returned from publish can remain pending indefinitely.
- Because the channel is shared/long-lived, a single stuck confirmation can cause long-lived endpoint API calls to hang “forever.”
Even if RabbitMQ itself is reliable, over time “rare” edge cases eventually occur (especially when dispatching many messages).
Impact / severity
- Causes application threads to deadlock/starve waiting on Publish/Send.
- Service stays online but blocked from doing any work until restarted.
- This is particularly problematic for our high-throughput systems where “rare indefinite hang” is unacceptable; we need a way to fail fast and let recoverability kick in.
Expected behavior
Throw an exception if a publisher confirm does not arrive within set period
Actual behavior
Endpoints appears unresponsive
Versions
All versions up to 8.x (included). Versions starting from 9.0 are not affected because the transport uses SDKs async APIs
Steps to reproduce
Use RabbitMQ transport with somewhat flaky connection to the broker so that some of the publisher confirms are missing
Relevant log output
Additional Information
Workarounds
Possible solutions
Additional information
Describe the bug
Description
In rare cases, an outgoing Publish call never completes and the application appears to be waiting “forever” for the publish to be acknowledged/confirmed. This is most visible when publishing via a long-lived running endpoint instance (static
IEndpointInstance/IMessageSession) rather than via theIMessageHandlerContextpassed into a handler.This looks like the transport can enter a state where it is waiting on a publisher confirm (or a confirmation-tracking task) that never completes, with no transport-level timeout to fail fast and recycle the channel.
Suspected root cause
Even if RabbitMQ itself is reliable, over time “rare” edge cases eventually occur (especially when dispatching many messages).
Impact / severity
Expected behavior
Throw an exception if a publisher confirm does not arrive within set period
Actual behavior
Endpoints appears unresponsive
Versions
All versions up to 8.x (included). Versions starting from 9.0 are not affected because the transport uses SDKs async APIs
Steps to reproduce
Use RabbitMQ transport with somewhat flaky connection to the broker so that some of the publisher confirms are missing
Relevant log output
Additional Information
Workarounds
Possible solutions
Additional information