Description
TLDR: I got a deterministic julia build
Hello everybody!
This is a followup from issue #25900.
Background: reproducible builds are important both for trusting that binary artifacts match a given source code, and from a scientific point of view. Two distributions that give a special attention to reproducible builds are Guix System and NixOS.
Currently, julia is not reproducible on neither systems.
After many failed tries, I successfully produced a deterministic result, with relatively few patches. Those patches are available on my fork, based on v1.4.0-rc1 release.
They cannot be applied as-is, basically because I'm not sure every corner case will work and because the last few patches disable some precompilation (work in progress on identifying why the current precompile_script breaks determinism), but I see this like a success, so I'll describe the patches so that we can discuss better solutions.
A few notes on the build environment: I'm building it with guix. Guix uses a clean chroot environment, with empty&isolated /tmp dir, among other things, the variable SOURCE_DATE_EPOCH set to 1, ASLR disabled.
- SOURCE_DATE_EPOCH: described here, with even more details here. The idea behind is quite simple: allow the current time to be set from a environment variable, shared between all tools, so that we can predict the output. Many tools and compilers already support it. My implementation in the following patches was just a quick hack to get it working, where to use it and how is open to discussion.
- Address Space Layout Randomization: must be disabled (echo 0 | sudo tee /proc/sys/kernel/randomize_va_space)
Description of the commits (file name links to the commit):
- base/loading.jl: Do not store mtime() in precompiled file. Solved by reading SOURCE_DATE_EPOCH env variable, if set, and using it instead of file's mtime. Tom McLaughlin's solution is just to skip the check, but that's not enough (the files are different).
- src/support/timefuncs.c: Again, support SOURCE_DATE_EPOCH.
- src/module.c: do not store hrtime() in precompiled modules. There's a backup counter "in case hrtime is not incrementing". Can hrtime just be dropped? Else, if modules compiled in different sessions (hence same mcounter -> same build_id) can the hash of the content//something deterministic be used instead? Here I just identified this as a point that needs to be addressed, so hopefully somebody with more knowledge can propose a real solution
- contrib/generate_precompile.jl: here, mktemp() and mktempdir() are called. The problem is that, the current directory get stored in the precompile cache (because of calls like `push!(DEPOT_PATH, prec_path)`). Maybe we can check for an env variable and decide what to do (use a random name/static name based on that)?
- base/Base.jl: srand. I don't think I need to add anything :D maybe initialize it with the current time() (so that SOURCE_DATE_EPOCH is used and a deterministic result is obtained)?
- Base.jl, sysimg.jl: time_ns() gets included somehow.
- src/codegen.cpp: I don't get how storing the time needed to load the file can be useful, but probably I'm missing something
- contrib/generate_precompile.jl: the other three patches prepare a bare minimum precompile cache. The cache is 101Mb, where the full precompile cache is 148Mb (but not deterministic yet). Only ~1000 precompile statements are used (compared to ~4000 with the version shipped), but I got to this point to prove that it can be done. Now just more testing is needed to add more statements
To recap:
- Current julia release does not build deterministically
- With my set of (drastic) patches, I get reproducible builds (also for external modules like Compat.jl and HTTP.jl)
- More work and some collaboration with somebody that has a better understanding on julia internals is needed
- I think the result will justify the effort
What do you think?
Thanks, Nicolò