Skip to content

Commit 7141706

Browse files
committed
Use the bind port as a hint for multiple workers on the same host
Otherwise all the workers would try to bind to the same port, which would cause an error.
1 parent 0ce06be commit 7141706

File tree

3 files changed

+31
-1
lines changed

3 files changed

+31
-1
lines changed

docs/src/_changelog.md

+4
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,10 @@ This documents notable changes in DistributedNext.jl. The format is based on
1515
implementation would take a worker out of the pool and immediately put it back
1616
in without waiting for the returned [`Future`](@ref). Now it will wait for the
1717
`Future` before putting the worker back in the pool ([#20]).
18+
- Fixed cases like `addprocs([("machine 10.1.1.1:9000", 2)])` where the bind
19+
port is specified. Previously this would cause errors when the workers all
20+
tried to bind to the same port, now all additional workers will treat the bind
21+
port as a port hint ([#19]).
1822

1923
### Added
2024
- A watcher mechanism has been added to detect when both the Distributed stdlib

src/cluster.jl

+22-1
Original file line numberDiff line numberDiff line change
@@ -733,8 +733,29 @@ function launch_additional(np::Integer, cmd::Cmd)
733733
io_objs = Vector{Any}(undef, np)
734734
addresses = Vector{Any}(undef, np)
735735

736+
worker_cmd = Cmd(cmd)
737+
bind_idx = findfirst(==("--bind-to"), cmd)
738+
if !isnothing(bind_idx)
739+
# The actual bind spec will be the next argument
740+
bind_idx += 1
741+
742+
bind_addr = worker_cmd[bind_idx]
743+
parts = split(bind_addr, ':')
744+
if length(parts) == 2
745+
port_str = parts[2]
746+
747+
# If the port is not specified as a port hint then we convert it
748+
# to a hint, otherwise the workers will try to bind to the same
749+
# port and error.
750+
if !startswith(port_str, '[')
751+
new_bind_addr = "$(parts[1]):[$(port_str)]"
752+
worker_cmd.exec[bind_idx] = new_bind_addr
753+
end
754+
end
755+
end
756+
736757
for i in 1:np
737-
io = open(detach(cmd), "r+")
758+
io = open(detach(worker_cmd), "r+")
738759
write_cookie(io)
739760
io_objs[i] = io.out
740761
end

test/sshmanager.jl

+5
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,11 @@ end
6767
@test 8000 >= worker.config.port < 9000
6868
test_n_remove_pids(new_pids)
6969

70+
print("\nssh addprocs with multiple workers and port specified\n")
71+
new_pids = addprocs_with_testenv([("localhost 127.0.0.1:8000", 2)]; sshflags=sshflags)
72+
@test length(new_pids) == 2
73+
test_n_remove_pids(new_pids)
74+
7075
print("\nssh addprocs with tunnel\n")
7176
new_pids = addprocs_with_testenv([("localhost", num_workers)]; tunnel=true, sshflags=sshflags)
7277
@test length(new_pids) == num_workers

0 commit comments

Comments
 (0)