Skip to content

[1.10] precompile silently dies due to SystemError when opening .ji files #3984

Closed
@topolarity

Description

@topolarity

Unfortunately, no MWE yet but I'm opening this issue early to share what I have. The basic problem is that I have a large project that 50% of the time when precompiling in a particular workflow results in a silent cancellation of precompilation.

When this happens, Pkg leaves behind an incomplete progress bar + spinners:

  Progress [===================>                     ]  134/293
  ◒ MLStyle
  ◑ Parsers

Otherwise, no error is printed to the terminal. The only other sign that something went wrong is that you'll often fall immediately into (undesirable) serial pre-compilation, etc.

Stack trace from a (slightly modified) build of 1.10:

    systemerror(p::String, errno::Int32; extrainfo::Nothing) at error.jl:176,
    kwcall(::@NamedTuple{extrainfo::Nothing}, ::typeof(systemerror), p::String, errno::Int32) at error.jl:176,
    kwcall(::@NamedTuple{extrainfo::Nothing}, ::typeof(systemerror), p::String) at error.jl:176,
    #systemerror#88 at error.jl:175 [inlined],
    systemerror at error.jl:175 [inlined],
    open(fname::String; lock::Bool, read::Bool, write::Nothing, create::Nothing, truncate::Nothing, append::Nothing) at iostream.jl:293,
    open at iostream.jl:275 [inlined],
    open(fname::String, mode::String; lock::Bool) at iostream.jl:356,
    open at iostream.jl:355 [inlined],
    stale_cachefile(modkey::Base.PkgId, build_id::UInt128, modpath::String, cachefile::String; ignore_loaded::Bool) at loading.jl:3008,
    stale_cachefile at loading.jl:3007 [inlined],
    #stale_cachefile#984 at loading.jl:3005 [inlined],
    stale_cachefile at loading.jl:3004 [inlined],
    isprecompiled(pkg::Base.PkgId; ignore_loaded::Bool, stale_cache::Dict{Tuple{Base.PkgId, UInt128, String, String}, Bool}, cachepaths::Vector{String}, sourcepath::String) at loading.jl:1397,
    isprecompiled at loading.jl:1389 [inlined],
    (::Pkg.API.var"#247#285"{Bool, Bool, Pkg.Types.Context, Vector{Task}, IOStream, Dict{Base.PkgId, String}, Dict{Base.PkgId, String}, Base.Event, Base.Event, ReentrantLock, Vector{Base.PkgId}, Vector{Base.PkgId}, Dict{Base.PkgId, String}, Vector{Base.PkgId}, Vector{Base.PkgId}, Dict{Base.PkgId, Bool}, Dict{Base.PkgId, Base.Event}, Dict{Base.PkgId, Bool}, Vector{Pkg.Types.PackageSpec}, Dict{Base.PkgId, String}, Dict{Tuple{Base.PkgId, UInt128, String, String}, Bool}, Vector{Base.PkgId}, Pkg.API.var"#color_string#258"{Bool}, Bool, Bool, Base.TTY, Base.Semaphore, Bool, String, Vector{String}, Vector{Base.PkgId}, Base.PkgId})() at API.jl:1503

This shows that SystemError is a ENOENT from this open: https://github.com/JuliaLang/julia/blob/4954197196d657d14edd3e9c61ac101866e6fa25/base/loading.jl#L3008

I think this suggests several problems:

  1. Base loading should probably not have an unguarded open like this (pretty much ever, I think - open can essentially always fail...)
  2. Pkg.precompile does not know the difference between a user interrupt vs. a failed assertion / internal error, so it silently discards this internal error assuming it has been "interrupted"
  3. (The actual bug) It seems there is maybe a race condition and/or caching misbehavior causing the file not to be where it is expected

I'm also worried this issue is quite common, but just hard to notice so that we don't get bug reports...

The 5 people on my team that work on this (or other similarly large) projects have all hit this issue. I actually hit this for 2+ months before I was informed the spinners aren't supposed to just die like that 😅

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions