Skip to content

Commit b283712

Browse files
committed
Docker: Add IO::Async::Loop::Epoll for IPerl
This changes the default event loop used inside of IPerl from `IO::Async::Loop::Poll` to `IO::Async::Loop::Epoll`. This is needed to address a SIGSEGV that would consistently occur for GPU containers. This SIGSEGV would only occur when running the code using IPerl inside of a Docker container. To clarify, - running the code as a regular script worked inside of Docker, - as did running the code as both a regular script and inside of IPerl on the host. It's not clear why the SIGSEGV would occur only when running inside of a Docker container, but by looking at the core dump and using a debugging build of TensorFlow, it seems to always happen when `libtensorflow` would call out to a subprocess such as when it would try to use `ptxas` from the CUDA toolkit (across multiple versions of the CUDA toolkit). The core dumps would usually contain a stack backtrace of - `tsl::SubProcess::WaitInternal` ; - `tsl::SubProcess::Communicate` ; - `stream_executor::GetPtxasVersionString` . When running under a debugger, stepping line-by-line over the calls to the subprocess would sometimes make the SIGSEGV go away, but this was not consistent. My theory is that the calls to the subprocess also use the `poll(2)` syscall and this is causing some kind of interaction between the two event loops. Switching to `epoll(7)` means this interaction goes away.
1 parent 7427ced commit b283712

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

docker/Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,7 @@ RUN bash -c 'cd /tf \
5454
'
5555

5656
# Install Jupyter kernel
57-
RUN plx cpm install -L /perl5 Devel::IPerl
57+
RUN plx cpm install -L /perl5 Devel::IPerl IO::Async::Loop::Epoll
5858

5959
# Install libtensorflow (with Alien deps to /perl5 and actual Alien dist to
6060
# /perl5-libtf)

0 commit comments

Comments
 (0)