Skip to content

Commit f9d3d89

Browse files
committed
Fix a race condition in the tests
Quick explanation of what the `GC tests for RemoteChannels` test does: 1. Create `RemoteChannel`s `rr` and `fstore` on worker 1 and worker 2 respectively from the master process. At this point only the master process knows about `rr` and `fstore`. 2. Master process calls `put!(fstore, rr)`, i.e. we remotecall worker 2 and put `rr` (which is owned worker 1 but is currently only known about by the master) into `fstore`. 3. Remotecall into worker 1 and check that it knows about `rr`. Step 3 should succeed despite us never previously explicitly communicating with worker 1, because `serialize(::ClusterSerializer, ::RemoteChannel)` will send a message to the owner of the `RemoteChannel` to inform them of its existence (see `send_add_client()`). This happens asynchronously in step 2, and on rare occasions worker 1 would not process that message before step 3, causing the test to fail. Now we give the check 10s to succeed.
1 parent 3a43532 commit f9d3d89

File tree

1 file changed

+3
-1
lines changed

1 file changed

+3
-1
lines changed

test/distributed_exec.jl

+3-1
Original file line numberDiff line numberDiff line change
@@ -295,7 +295,9 @@ let wid1 = workers()[1],
295295

296296
put!(fstore, rr)
297297
if include_thread_unsafe_tests()
298-
@test remotecall_fetch(k -> haskey(Distributed.PGRP.refs, k), wid1, rrid) == true
298+
# timedwait() is necessary because wid1 is asynchronously informed of
299+
# the existence of rr/rrid through the call to `put!(fstore, rr)`.
300+
@test timedwait(() -> remotecall_fetch(k -> haskey(Distributed.PGRP.refs, k), wid1, rrid), 10) === :ok
299301
end
300302
finalize(rr) # finalize locally
301303
yield() # flush gc msgs

0 commit comments

Comments
 (0)