-
Notifications
You must be signed in to change notification settings - Fork 409
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bootstrap is not reproducible wrt -j value #9507
bootstrap is not reproducible wrt -j value #9507
Comments
-ocaml boot/bootstrap.ml --verbose ${jobs}
+ocaml boot/bootstrap.ml did not improve the outcome, but I found that the below helps to avoid the race: -ocaml boot/bootstrap.ml --verbose ${jobs}
+ocaml boot/bootstrap.ml -j 1 ( |
Thanks for the report. At first I thought it was something like #9152, but it looks like this particular issue is about dune's bootstrap process itself. I'll have a look. |
I had a look at this. The core of the issue is this check: https://github.com/ocaml/dune/blob/3.12.1/boot/duneboot.ml#L1194 For
That seemed weird since I don't think we touched that recently, so I did some testing with the following test: #!/usr/bin/env bash
rm -rf _boot
ocaml boot/bootstrap.ml --verbose -j 1 2>/dev/null
cp _boot/dune.exe dune1.exe
rm -rf _boot
ocaml boot/bootstrap.ml --verbose -j 8 2>/dev/null
cp _boot/dune.exe dune8.exe
if cmp dune1.exe dune8.exe; then
echo "identical"
elif (( $(stat -c%s dune1.exe)==$(stat -c%s dune8.exe)));then
echo "same size"
else
echo "completely different"
fi For 3.6.0, 3.11.1 and 3.12.1 this prints "completely different", so I think that this has been the case since the new bootstrap process got introduced in #2854. And I verified that removing the optimization prints "identical". As for how to fix this: this is easy to add (remove the
Having written that, I think that the easiest way for now is to introduce a |
Getting rid of the optimization sounds like the simplest thing tbh. It's problematic in other contexts - such as on systems with limited memory. A flag like |
If building in a single command is important to someone, we could always move it to a separate |
You are right. I re-ran the test with 3.11.1 and it produced the same variations today. I checked the different (j1/j4) commands used to build dune.exe and besides |
I checked the difference between the binaries using https://github.com/noseglasses/elf_diff. In the separately-built version, the compiler managed to do some optimizations and some functions in |
Fixes ocaml#9507 In `duneboot.ml`, there are 2 strategies to build `dune.exe`: - separate compilation (build all modules and link them together in several commands) - single command compilation (pass all modules to the compiler driver) Separate compilation can be executed in parallel when `-j` is set accordingly, which speeds up the build. But when only one job can be executed at once (`-j 1`), it is faster to use a single command. So, we would switch to single command build when `-j 1` was passed. However, this is surprising because it produces a different binary depending on the number of parallel jobs (see ocaml#9507). So this disables this optimization by default; it can be enabled by hand by passing `--use-single-command`. (NB: this does not apply to Windows, which always uses single command compilation)
Fixes ocaml#9507 In `duneboot.ml`, there are 2 strategies to build `dune.exe`: - separate compilation (build all modules and link them together in several commands) - single command compilation (pass all modules to the compiler driver) Separate compilation can be executed in parallel when `-j` is set accordingly, which speeds up the build. But when only one job can be executed at once (`-j 1`), it is faster to use a single command. So, we would switch to single command build when `-j 1` was passed. However, this is surprising because it produces a different binary depending on the number of parallel jobs (see ocaml#9507). So this disables this optimization by default; it can be enabled by hand by passing `--use-single-command`. (NB: this does not apply to Windows, which always uses single command compilation) Signed-off-by: Etienne Millon <me@emillon.org>
Could you remind me why? Seems odd and probably means that we can't build dune on a windows box with low memory. |
I actually don't know. I opened #9613 and it doesn't seem to be necessary. |
Fixes ocaml#9507 In `duneboot.ml`, there are 2 strategies to build `dune.exe`: - separate compilation (build all modules and link them together in several commands) - single command compilation (pass all modules to the compiler driver) Separate compilation can be executed in parallel when `-j` is set accordingly, which speeds up the build. But when only one job can be executed at once (`-j 1`), it is faster to use a single command. So, we would switch to single command build when `-j 1` was passed. However, this is surprising because it produces a different binary depending on the number of parallel jobs (see ocaml#9507). So this disables this optimization by default; it can be enabled by hand by passing `--use-single-command`. (NB: this does not apply to Windows, which always uses single command compilation) Signed-off-by: Etienne Millon <me@emillon.org>
Fixes ocaml#9507 The bootstrap process can use 2 strategies: - parallel: run compile commands in parallel and link the rest - single-command: run ocamlopt with a long list of arguments The single-command strategy is used if win32 or if `-j 1` is set (implicitly or explicitly). One problem is that: parallel and single-command create different binaries (ocaml#9507). The assumption was that single-command would be faster than `-j 1` on Linux and any parallel build on Windows. However, from a quick benchmark the assumption is true on Linux but false on Windows. The tiny savings in the case of Linux where only a single core is available are not enough to justify the extra code path and reproducibility gotcha. Signed-off-by: Etienne Millon <me@emillon.org>
Fixes ocaml#9507 The bootstrap process can use 2 strategies: - parallel: run compile commands in parallel and link the rest - single-command: run ocamlopt with a long list of arguments The single-command strategy is used if win32 or if `-j 1` is set (implicitly or explicitly). One problem is that: parallel and single-command create different binaries (ocaml#9507). The assumption was that single-command would be faster than `-j 1` on Linux and any parallel build on Windows. However, from a quick benchmark the assumption is true on Linux but false on Windows. The tiny savings in the case of Linux where only a single core is available are not enough to justify the extra code path and reproducibility gotcha. Signed-off-by: Etienne Millon <me@emillon.org>
Fixes #9507 The bootstrap process can use 2 strategies: - parallel: run compile commands in parallel and link the rest - single-command: run ocamlopt with a long list of arguments The single-command strategy is used if win32 or if `-j 1` is set (implicitly or explicitly). One problem is that: parallel and single-command create different binaries (#9507). The assumption was that single-command would be faster than `-j 1` on Linux and any parallel build on Windows. However, from a quick benchmark the assumption is true on Linux but false on Windows. The tiny savings in the case of Linux where only a single core is available are not enough to justify the extra code path and reproducibility gotcha. Signed-off-by: Etienne Millon <me@emillon.org>
CHANGES: ### Added - Introduce a `(dynamic_include ..)` stanza. This is like `(include foo)` but allows `foo` to be the target of a rule. Currently, there are some limitations on the stanzas that can be generated. For example, public executables, libraries are currently forbidden. (ocaml/dune#9913, @rgrinberg) - Introduce `$ dune promotion list` to print the list of available promotions. (ocaml/dune#9705, @moyodiallo) - If Sherlodoc is installed, add a search bar in generated HTML docs (ocaml/dune#9772, @EmileTrotignon) - Add `only_sources` field to `copy_files` stanza (ocaml/dune#9827, fixes ocaml/dune#9709, @jchavarri) - The `(foreign_library)` stanza now supports the `(enabled_if)` field. (ocaml/dune#9914, @nojb) ### Fixed - Fix `$ dune install -p` incorrectly recognizing packages that are supposed to be filtered (ocaml/dune#9879, fixes ocaml/dune#4814, @rgrinberg) - subst: correctly handle opam files in opam/ subdirectory (ocaml/dune#9895, fixes ocaml/dune#9862, @emillon) - Odoc private rules are not set up if a library is not available due to `enabled_if` (ocaml/dune#9897, @rgrinberg and @jchavarri) ### Changed - When dune language 3.14 is enabled, resolve the binary in `(run %{bin:..} ..)` from where the binary is built. (ocaml/dune#9708, @rgrinberg) - boot: remove single-command bootstrap. This was an alternative bootstrap strategy that was used in certain conditions. Removal makes the bootstrap a bit slower on Linux when only a single core is available, but bootstrap is now reproducible in all cases. (ocaml/dune#9735, fixes ocaml/dune#9507, @emillon)
CHANGES: ### Added - Introduce a `(dynamic_include ..)` stanza. This is like `(include foo)` but allows `foo` to be the target of a rule. Currently, there are some limitations on the stanzas that can be generated. For example, public executables, libraries are currently forbidden. (ocaml/dune#9913, @rgrinberg) - Introduce `$ dune promotion list` to print the list of available promotions. (ocaml/dune#9705, @moyodiallo) - If Sherlodoc is installed, add a search bar in generated HTML docs (ocaml/dune#9772, @EmileTrotignon) - Add `only_sources` field to `copy_files` stanza (ocaml/dune#9827, fixes ocaml/dune#9709, @jchavarri) - The `(foreign_library)` stanza now supports the `(enabled_if)` field. (ocaml/dune#9914, @nojb) ### Fixed - Fix `$ dune install -p` incorrectly recognizing packages that are supposed to be filtered (ocaml/dune#9879, fixes ocaml/dune#4814, @rgrinberg) - subst: correctly handle opam files in opam/ subdirectory (ocaml/dune#9895, fixes ocaml/dune#9862, @emillon) - Odoc private rules are not set up if a library is not available due to `enabled_if` (ocaml/dune#9897, @rgrinberg and @jchavarri) ### Changed - When dune language 3.14 is enabled, resolve the binary in `(run %{bin:..} ..)` from where the binary is built. (ocaml/dune#9708, @rgrinberg) - boot: remove single-command bootstrap. This was an alternative bootstrap strategy that was used in certain conditions. Removal makes the bootstrap a bit slower on Linux when only a single core is available, but bootstrap is now reproducible in all cases. (ocaml/dune#9735, fixes ocaml/dune#9507, @emillon)
CHANGES: ### Added - Introduce a `(dynamic_include ..)` stanza. This is like `(include foo)` but allows `foo` to be the target of a rule. Currently, there are some limitations on the stanzas that can be generated. For example, public executables, libraries are currently forbidden. (ocaml/dune#9913, @rgrinberg) - Introduce `$ dune promotion list` to print the list of available promotions. (ocaml/dune#9705, @moyodiallo) - If Sherlodoc is installed, add a search bar in generated HTML docs (ocaml/dune#9772, @EmileTrotignon) - Add `only_sources` field to `copy_files` stanza (ocaml/dune#9827, fixes ocaml/dune#9709, @jchavarri) - The `(foreign_library)` stanza now supports the `(enabled_if)` field. (ocaml/dune#9914, @nojb) ### Fixed - Fix `$ dune install -p` incorrectly recognizing packages that are supposed to be filtered (ocaml/dune#9879, fixes ocaml/dune#4814, @rgrinberg) - subst: correctly handle opam files in opam/ subdirectory (ocaml/dune#9895, fixes ocaml/dune#9862, @emillon) - Odoc private rules are not set up if a library is not available due to `enabled_if` (ocaml/dune#9897, @rgrinberg and @jchavarri) ### Changed - When dune language 3.14 is enabled, resolve the binary in `(run %{bin:..} ..)` from where the binary is built. (ocaml/dune#9708, @rgrinberg) - boot: remove single-command bootstrap. This was an alternative bootstrap strategy that was used in certain conditions. Removal makes the bootstrap a bit slower on Linux when only a single core is available, but bootstrap is now reproducible in all cases. (ocaml/dune#9735, fixes ocaml/dune#9507, @emillon)
While working on reproducible builds for openSUSE, I found that our
ocaml-dune
3.12.1 package varies depending on the amount of parallelism.Expected Behavior
Build output should be deterministic - as it still was in the previous version 3.11.1
Actual Behavior
Binaries produced by
-j 1
and-j 4
vary even in length.A diff of
strings
output looks thus:one created a
_boot/compiled_ml_files
and the other a_boot/mods_list
. Build logs also differ a lot.Reproduction
Specifications
dune
(output ofdune --version
): 3.12.1ocaml
(output ofocamlc --version
): 4.14.1Additional information
dune
with the--verbose
flag):https://gist.github.com/bmwiedemann/c05ab97a162d966ae40b7a39065efd6e
The text was updated successfully, but these errors were encountered: