Skip to content

Commit 766f8c3

Browse files
committed
Replace a timeout task with timedwait()
According to a stacktrace from a hung DistributedNext CI job this task was causing the process to hang before exiting: ```julia InterruptException() _jl_mutex_unlock at C:/workdir/src\threading.c:1012 jl_mutex_unlock at C:/workdir/src\julia_locks.h:80 [inlined] ijl_task_get_next at C:/workdir/src\scheduler.c:458 poptask at .\task.jl:1163 wait at .\task.jl:1172 task_done_hook at .\task.jl:839 jfptr_task_done_hook_98752.1 at C:\hostedtoolcache\windows\julia\nightly\x64\lib\julia\sys.dll (unknown line) jl_apply at C:/workdir/src\julia.h:2233 [inlined] jl_finish_task at C:/workdir/src\task.c:338 start_task at C:/workdir/src\task.c:1274 From worker 82: fatal: error thrown and no exception handler available.Unhandled Task ERROR: InterruptException: Stacktrace: [1] poptask(W::Base.IntrusiveLinkedListSynchronized{Task}) @ Base .\task.jl:1163 [2] wait() @ Base .\task.jl:1172 [3] wait(c::Base.GenericCondition{ReentrantLock}; first::Bool) @ Base .\condition.jl:141 [4] wait @ .\condition.jl:136 [inlined] [5] put_buffered(c::Channel{Any}, v::Int64) @ Base .\channels.jl:420 [6] put!(c::Channel{Any}, v::Int64) @ Base .\channels.jl:398 [7] put!(rv::DistributedNext.RemoteValue, args::Int64) @ DistributedNext D:\a\DistributedNext.jl\DistributedNext.jl\src\remotecall.jl:703 [8] (::DistributedNext.var"#create_worker##11#create_worker##12"{DistributedNext.RemoteValue, Float64})() @ DistributedNext D:\a\DistributedNext.jl\DistributedNext.jl\src\cluster.jl:721 ``` Replaced it with a call to `timedwait()`, which has the advantage of being a lot simpler than an extra task.
1 parent 3a43532 commit 766f8c3

File tree

1 file changed

+2
-4
lines changed

1 file changed

+2
-4
lines changed

src/cluster.jl

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -683,11 +683,9 @@ function create_worker(manager, wconfig)
683683
send_msg_now(w, MsgHeader(RRID(0,0), ntfy_oid), join_message)
684684

685685
@async manage(w.manager, w.id, w.config, :register)
686+
686687
# wait for rr_ntfy_join with timeout
687-
timedout = false
688-
@async (sleep($timeout); timedout = true; put!(rr_ntfy_join, 1))
689-
wait(rr_ntfy_join)
690-
if timedout
688+
if timedwait(() -> isready(rr_ntfy_join), timeout) === :timed_out
691689
error("worker did not connect within $timeout seconds")
692690
end
693691
lock(client_refs) do

0 commit comments

Comments
 (0)