Open
Description
openedon Mar 18, 2024
I am intermittently encountering: attempt to send to unknown socket
when put!
'ing into a RemoteChannel
after rmprocs
.
Here is the reproducer: https://gist.github.com/JBlaschke/8965e70acf52700605ac9db4af7eaf62 -- it works as follows:
- Start 2 processes:
addprocs(2)
- Set up two
RemoteChannel
s:ch_in
andch_out
for inputs and outputs. - Start worker processes that
take!
s fromch_in
andput!
s a result inch_out
. put!
a bunch of data intoch_in
rmproc(3)
put!
a bunch of data intoch_in
You should see the following output:
[2, 3]
From worker 2: hi there, I'm running on pid=2
From worker 3: hi there, I'm running on pid=3
Taken: 3
Taken: 2
From worker 2: hi there, I'm running on pid=2
From worker 3: hi there, I'm running on pid=3
Taken: 4
Taken: 5
[2]
┌ Error: Fatal error on process 1
│ exception =
│ attempt to send to unknown socket
│ Stacktrace:
│ [1] error(s::String)
│ @ Base ./error.jl:35
│ [2] send_msg_unknown(s::Sockets.TCPSocket, header::Distributed.MsgHeader, msg::Distributed.ResultMsg)
│ @ Distributed ~/local/juliaup/juliaup/julia-1.10.2+0.x64.linux.gnu/share/julia/stdlib/v1.10/Distributed/src/messages.jl:99
│ [3] send_msg_now(s::Sockets.TCPSocket, header::Distributed.MsgHeader, msg::Distributed.ResultMsg)
│ @ Distributed ~/local/juliaup/juliaup/julia-1.10.2+0.x64.linux.gnu/share/julia/stdlib/v1.10/Distributed/src/messages.jl:115
│ [4] deliver_result(sock::Sockets.TCPSocket, msg::Symbol, oid::Distributed.RRID, value::Int64)
│ @ Distributed ~/local/juliaup/juliaup/julia-1.10.2+0.x64.linux.gnu/share/julia/stdlib/v1.10/Distributed/src/process_messages.jl:102
│ [5] (::Distributed.var"#109#111"{Distributed.CallMsg{:call_fetch}, Distributed.MsgHeader, Sockets.TCPSocket})()
│ @ Distributed ~/local/juliaup/juliaup/julia-1.10.2+0.x64.linux.gnu/share/julia/stdlib/v1.10/Distributed/src/process_messages.jl:295
└ @ Distributed ~/local/juliaup/juliaup/julia-1.10.2+0.x64.linux.gnu/share/julia/stdlib/v1.10/Distributed/src/process_messages.jl:106
From worker 2: hi there, I'm running on pid=2
Taken: 7
From worker 2: hi there, I'm running on pid=2
Taken: 8
From worker 2: hi there, I'm running on pid=2
Taken: 9
Note that the error does not recur after occurring once.
This bug is intermittent, and therefore some of the timings are undoubtedly tuned to my system. However I found that waiting between rmprocs(3)
and the next put!
does not change the behaviour.
Background
- versioninfo:
julia> versioninfo()
Julia Version 1.10.2
Commit bd47eca2c8a (2024-03-01 10:14 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 8 × 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-15.0.7 (ORCJIT, tigerlake)
Threads: 1 default, 0 interactive, 1 GC (on 8 virtual cores)
- Julia is installed using Juliaup
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment