Skip to content

Intermittent segfault in apparently memory safe code, perhaps FFTW related #48722

Closed

Description

We are observing intermittent segfault behavior when running the tests of the Sunny.jl package. It sometimes shows up during Github CI testing of our simplified crash branch:

pkg> add Sunny#crash
pkg> test Sunny

It only crashes sometimes, however. On my Mac, for example, crashes are rare, but when they happen, it's in roughly the same code location. An example of the segfault output is shown from this CI action: https://github.com/SunnySuite/Sunny.jl/actions/runs/4214225988/jobs/7314550112

The segfault seems to always occur inside FFTW, but perhaps there is memory corruption happening prior to FFTW.

The branch Sunny#crash contains no @inbounds annotations, or other "memory unsafe" operations from what we can tell (presumably the FFT package is intended to be memory safe?). Sunny does depend on external C libraries, which could of course corrupt memory.

I tried to bisect to a commit where the crash first appeared, and it seems to be one of these two:
SunnySuite/Sunny.jl@fb0a631 <- where crashes become very noticeable
SunnySuite/Sunny.jl@9f97b54 <- parent commit, seems suspicious to me

We recorded a log of the crash using --bug-report=rr and uploaded here:
https://julialang-dumps.s3.amazonaws.com/reports/2023-02-18T02-49-23-ddahlbom.tar.zst

Two example segfault outputs are below.

signal (11): Segmentation fault
in expression starting at /home/runner/work/Sunny.jl/Sunny.jl/test/test_energy_consistency.jl:77
unknown function (ip: 0x11b22230)
energy at /home/runner/work/Sunny.jl/Sunny.jl/src/System/Interactions.jl:194
test_delta at /home/runner/work/Sunny.jl/Sunny.jl/test/test_energy_consistency.jl:67
unknown function (ip: 0x7f9c2fdad002)
_jl_invoke at /cache/build/default-amdci4-2/julialang/julia-release-1-dot-8/src/gf.c:2377 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-2/julialang/julia-release-1-dot-8/src/gf.c:2559
jl_apply at /cache/build/default-amdci4-2/julialang/julia-release-1-dot-8/src/julia.h:1843 [inlined]
do_call at /cache/build/default-amdci4-2/julialang/julia-release-1-dot-8/src/interpreter.c:126
eval_value at /cache/build/default-amdci4-2/julialang/julia-release-1-dot-8/src/interpreter.c:215
...

and

signal (11): Segmentation fault
in expression starting at /home/runner/work/Sunny.jl/Sunny.jl/test/test_energy_consistency.jl:77
unknown function (ip: 0x31)
unsafe_execute! at /home/runner/.julia/packages/FFTW/sfy1o/src/fft.jl:500 [inlined]
mul! at /home/runner/.julia/packages/FFTW/sfy1o/src/fft.jl:859 [inlined]
energy at /home/runner/work/Sunny.jl/Sunny.jl/src/System/Ewald.jl:125
Allocations: 294550517 (Pool: 294319097; Big: 231420); GC: 174
ERROR: LoadError: Package Sunny errored during testing (received signal: 11)
Stacktrace:
 [1] pkgerror(msg::String)
   @ Pkg.Types /opt/hostedtoolcache/julia/1.8.5/x64/share/julia/stdlib/v1.8/Pkg/src/Types.jl:67
 [2] test(ctx::Pkg.Types.Context, pkgs::Vector{Pkg.Types.PackageSpec}; coverage::Bool, julia_args::Cmd, test_args::Cmd, test_fn::Nothing, force_latest_compatible_version::Bool, allow_earlier_backwards_compatible_versions::Bool, allow_reresolve::Bool)
...
  1. The output of versioninfo()

We have observed the problem on multiple machines, all using Julia 1.8.5. It primarily appears on Github Actions CI using x64, but I have also seen it on my M1 Mac, which is:

julia> versioninfo()
Julia Version 1.8.5
Commit 17cfb8e65ea (2023-01-08 06:45 UTC)
Platform Info:
  OS: macOS (arm64-apple-darwin21.5.0)
  CPU: 8 × Apple M1 Pro
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, apple-m1)
  Threads: 1 on 6 virtual cores
  1. How you installed Julia

Github Actions Julia installer with [Julia 1.8 - ubuntu-latest - x64] or juliaup for Mac.

Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions