Skip to content

Towards a deterministic julia build #34753

Closed
@nico202

Description

@nico202

TLDR: I got a deterministic julia build

Hello everybody!

This is a followup from issue #25900.

Background: reproducible builds are important both for trusting that binary artifacts match a given source code, and from a scientific point of view. Two distributions that give a special attention to reproducible builds are Guix System and NixOS.

Currently, julia is not reproducible on neither systems.

After many failed tries, I successfully produced a deterministic result, with relatively few patches. Those patches are available on my fork, based on v1.4.0-rc1 release.
They cannot be applied as-is, basically because I'm not sure every corner case will work and because the last few patches disable some precompilation (work in progress on identifying why the current precompile_script breaks determinism), but I see this like a success, so I'll describe the patches so that we can discuss better solutions.

A few notes on the build environment: I'm building it with guix. Guix uses a clean chroot environment, with empty&isolated /tmp dir, among other things, the variable SOURCE_DATE_EPOCH set to 1, ASLR disabled.

  • SOURCE_DATE_EPOCH: described here, with even more details here. The idea behind is quite simple: allow the current time to be set from a environment variable, shared between all tools, so that we can predict the output. Many tools and compilers already support it. My implementation in the following patches was just a quick hack to get it working, where to use it and how is open to discussion.
  • Address Space Layout Randomization: must be disabled (echo 0 | sudo tee /proc/sys/kernel/randomize_va_space)

Description of the commits (file name links to the commit):

  1. base/loading.jl: Do not store mtime() in precompiled file. Solved by reading SOURCE_DATE_EPOCH env variable, if set, and using it instead of file's mtime. Tom McLaughlin's solution is just to skip the check, but that's not enough (the files are different).
  2. src/support/timefuncs.c: Again, support SOURCE_DATE_EPOCH.
  3. src/module.c: do not store hrtime() in precompiled modules. There's a backup counter "in case hrtime is not incrementing". Can hrtime just be dropped? Else, if modules compiled in different sessions (hence same mcounter -> same build_id) can the hash of the content//something deterministic be used instead? Here I just identified this as a point that needs to be addressed, so hopefully somebody with more knowledge can propose a real solution
  4. contrib/generate_precompile.jl: here, mktemp() and mktempdir() are called. The problem is that, the current directory get stored in the precompile cache (because of calls like `push!(DEPOT_PATH, prec_path)`). Maybe we can check for an env variable and decide what to do (use a random name/static name based on that)?
  5. base/Base.jl: srand. I don't think I need to add anything :D maybe initialize it with the current time() (so that SOURCE_DATE_EPOCH is used and a deterministic result is obtained)?
  6. Base.jl, sysimg.jl: time_ns() gets included somehow.
  7. src/codegen.cpp: I don't get how storing the time needed to load the file can be useful, but probably I'm missing something
  8. contrib/generate_precompile.jl: the other three patches prepare a bare minimum precompile cache. The cache is 101Mb, where the full precompile cache is 148Mb (but not deterministic yet). Only ~1000 precompile statements are used (compared to ~4000 with the version shipped), but I got to this point to prove that it can be done. Now just more testing is needed to add more statements

To recap:

  1. Current julia release does not build deterministically
  2. With my set of (drastic) patches, I get reproducible builds (also for external modules like Compat.jl and HTTP.jl)
  3. More work and some collaboration with somebody that has a better understanding on julia internals is needed
  4. I think the result will justify the effort

What do you think?

Thanks, Nicolò

Metadata

Metadata

Assignees

No one assigned

    Labels

    buildingBuild system, or building Julia or its dependencies

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions