-
Notifications
You must be signed in to change notification settings - Fork 115
Description
Hi,
It seems that the io_context destructor will hang in its internal shutdown() function in the call to zmq_ctx_term(), if there still are pending azmq::socket operations/completion handlers and the azmq::socket object still exists too. This can happen if the program extends the lifetime of the azmq::socket object into the completion handler by using shared_ptr/shared_from_this(), etc., and then exits the io_context.run() by doing io_context.stop() (for example as reaction to receiving SIGINT/SIGTERM).
Is this normal/expected? One "obvious" solution is to call socket.cancel() to abort and destroy all the pending completion handlers and then also destroy all the azmq::socket objects, instead of using io_context.stop(). This comment in a cppzmq issue suggests that this is even necessary: zeromq/cppzmq#139 (comment)
However, other boost::asio objects do not seem to have such requirements (though I don't know whether that's intentional or just coincidence). Should there perhaps be some sort of auto-close mechanism to avoid blocking the io_context destructor?
Small example:
// Build: g++ -Wall -g azmq_shutdown_hang.cpp -lzmq -lboost_filesystem -o azmq_shutdown_hang
#include <azmq/socket.hpp>
#include <zmq.hpp>
#include <array>
#include <memory>
#include <stdio.h>
int main()
{
boost::asio::io_context ioctx;
auto socket = std::make_shared<azmq::socket>(ioctx, ZMQ_PULL);
socket->set_option(azmq::socket::linger(0));
socket->connect("tcp://127.0.0.1:0");
std::array<uint8_t, 1> buffer;
// Capturing the shared_ptr<socket> into the completion handler lambda extends the socket's life-time beyond that of the io_context.
// Usually the socket would be destroyed first (if it or the shared_ptr is declared after the io_context), but in this case it is not.
// The io_context destructor should destroy the pending operation and its completion handler (without calling it),
// which would also finally destroy the socket, but apparently the io_context hangs instead.
socket->async_receive(boost::asio::buffer(buffer),
[socket](boost::system::error_code const& ec, size_t)
{
printf("async_receive completion handler, ec = %s\n", ec.message().c_str());
}
);
// Calling cancel() removes the pending async operation, so the socket is destroyed before the io_service again, then it does not hang.
//socket->cancel();
printf("destroying io_context, does it hang?...\n");
return 0;
}
Backtrace of the hang:
#0 0x00007ffff7b3fd7f in __GI___poll (fds=0x7fffffffd7a0, nfds=1, timeout=-1)
at ../sysdeps/unix/sysv/linux/poll.c:29
#1 0x00007ffff7f34dde in zmq::signaler_t::wait(int) const () from /usr/local/lib/libzmq.so.5
#2 0x00007ffff7f11d72 in zmq::mailbox_t::recv(zmq::command_t*, int) () from /usr/local/lib/libzmq.so.5
#3 0x00007ffff7f0321f in zmq::ctx_t::terminate() () from /usr/local/lib/libzmq.so.5
#4 0x00007ffff7f5575e in zmq_ctx_term () from /usr/local/lib/libzmq.so.5
#5 0x00005555555829a6 in std::_Sp_counted_deleter<void*, int (*)(void*), std::allocator<void>, (__gnu_cxx::_Lock_policy)2>::_M_dispose (this=0x5555555d0ae0) at /usr/include/c++/11/bits/shared_ptr_base.h:442
#6 0x000055555556e7d7 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x5555555d0ae0)
at /usr/include/c++/11/bits/shared_ptr_base.h:168
#7 0x000055555556bdbd in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=0x7fffffffda28,
__in_chrg=<optimized out>) at /usr/include/c++/11/bits/shared_ptr_base.h:705
#8 0x000055555556566c in std::__shared_ptr<void, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=0x7fffffffda20,
__in_chrg=<optimized out>) at /usr/include/c++/11/bits/shared_ptr_base.h:1154
#9 0x000055555556cb72 in std::__shared_ptr<void, (__gnu_cxx::_Lock_policy)2>::reset (this=0x5555555d0338)
at /usr/include/c++/11/bits/shared_ptr_base.h:1272
#10 0x0000555555567bbe in azmq::detail::socket_service::shutdown_service (this=0x5555555d0310)
at /usr/local/include/azmq/detail/socket_service.hpp:206
#11 0x0000555555565171 in boost::asio::io_context::service::shutdown (this=0x5555555d0310)
at /usr/local/include/boost/asio/impl/io_context.ipp:148
#12 0x0000555555560637 in boost::asio::detail::service_registry::shutdown_services (this=0x5555555d0180)
at /usr/local/include/boost/asio/detail/impl/service_registry.ipp:44
#13 0x0000555555560b9b in boost::asio::execution_context::shutdown (this=0x7fffffffdb00)
at /usr/local/include/boost/asio/impl/execution_context.ipp:41
#14 0x00005555555650a4 in boost::asio::io_context::~io_context (this=0x7fffffffdb00, __in_chrg=<optimized out>)
at /usr/local/include/boost/asio/impl/io_context.ipp:58
#15 0x000055555555c1be in main () at azmq_shutdown_hang.cpp:35
Tested with boost 1.82.0, libzmq master, azmq master.