Description
openedon Feb 19, 2023
We are observing intermittent segfault behavior when running the tests of the Sunny.jl package. It sometimes shows up during Github CI testing of our simplified crash
branch:
pkg> add Sunny#crash
pkg> test Sunny
It only crashes sometimes, however. On my Mac, for example, crashes are rare, but when they happen, it's in roughly the same code location. An example of the segfault output is shown from this CI action: https://github.com/SunnySuite/Sunny.jl/actions/runs/4214225988/jobs/7314550112
The segfault seems to always occur inside FFTW, but perhaps there is memory corruption happening prior to FFTW.
The branch Sunny#crash
contains no @inbounds
annotations, or other "memory unsafe" operations from what we can tell (presumably the FFT package is intended to be memory safe?). Sunny does depend on external C libraries, which could of course corrupt memory.
I tried to bisect to a commit where the crash first appeared, and it seems to be one of these two:
SunnySuite/Sunny.jl@fb0a631 <- where crashes become very noticeable
SunnySuite/Sunny.jl@9f97b54 <- parent commit, seems suspicious to me
We recorded a log of the crash using --bug-report=rr
and uploaded here:
https://julialang-dumps.s3.amazonaws.com/reports/2023-02-18T02-49-23-ddahlbom.tar.zst
Two example segfault outputs are below.
signal (11): Segmentation fault
in expression starting at /home/runner/work/Sunny.jl/Sunny.jl/test/test_energy_consistency.jl:77
unknown function (ip: 0x11b22230)
energy at /home/runner/work/Sunny.jl/Sunny.jl/src/System/Interactions.jl:194
test_delta at /home/runner/work/Sunny.jl/Sunny.jl/test/test_energy_consistency.jl:67
unknown function (ip: 0x7f9c2fdad002)
_jl_invoke at /cache/build/default-amdci4-2/julialang/julia-release-1-dot-8/src/gf.c:2377 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-2/julialang/julia-release-1-dot-8/src/gf.c:2559
jl_apply at /cache/build/default-amdci4-2/julialang/julia-release-1-dot-8/src/julia.h:1843 [inlined]
do_call at /cache/build/default-amdci4-2/julialang/julia-release-1-dot-8/src/interpreter.c:126
eval_value at /cache/build/default-amdci4-2/julialang/julia-release-1-dot-8/src/interpreter.c:215
...
and
signal (11): Segmentation fault
in expression starting at /home/runner/work/Sunny.jl/Sunny.jl/test/test_energy_consistency.jl:77
unknown function (ip: 0x31)
unsafe_execute! at /home/runner/.julia/packages/FFTW/sfy1o/src/fft.jl:500 [inlined]
mul! at /home/runner/.julia/packages/FFTW/sfy1o/src/fft.jl:859 [inlined]
energy at /home/runner/work/Sunny.jl/Sunny.jl/src/System/Ewald.jl:125
Allocations: 294550517 (Pool: 294319097; Big: 231420); GC: 174
ERROR: LoadError: Package Sunny errored during testing (received signal: 11)
Stacktrace:
[1] pkgerror(msg::String)
@ Pkg.Types /opt/hostedtoolcache/julia/1.8.5/x64/share/julia/stdlib/v1.8/Pkg/src/Types.jl:67
[2] test(ctx::Pkg.Types.Context, pkgs::Vector{Pkg.Types.PackageSpec}; coverage::Bool, julia_args::Cmd, test_args::Cmd, test_fn::Nothing, force_latest_compatible_version::Bool, allow_earlier_backwards_compatible_versions::Bool, allow_reresolve::Bool)
...
- The output of
versioninfo()
We have observed the problem on multiple machines, all using Julia 1.8.5. It primarily appears on Github Actions CI using x64, but I have also seen it on my M1 Mac, which is:
julia> versioninfo()
Julia Version 1.8.5
Commit 17cfb8e65ea (2023-01-08 06:45 UTC)
Platform Info:
OS: macOS (arm64-apple-darwin21.5.0)
CPU: 8 × Apple M1 Pro
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-13.0.1 (ORCJIT, apple-m1)
Threads: 1 on 6 virtual cores
- How you installed Julia
Github Actions Julia installer with [Julia 1.8 - ubuntu-latest - x64]
or juliaup for Mac.
Thank you.