Description
We have this code in our https://github.com/zama-ai/tfhe-rs project on commit f1c21888a762ddf9de017ae52dc120c141ec9c02 from tfhe/docs/how_to/compress.md line 44 and beyond:
use tfhe::prelude::*;
use tfhe::{
generate_keys, set_server_key, ClientKey, CompressedServerKey, ConfigBuilder, FheUint8,
};
fn main() {
let config = ConfigBuilder::all_disabled()
.enable_default_integers()
.build();
let cks = ClientKey::generate(config);
let compressed_sks = CompressedServerKey::new(&cks);
println!(
"compressed size : {}",
bincode::serialize(&compressed_sks).unwrap().len()
);
let sks = compressed_sks.decompress();
println!(
"decompressed size: {}",
bincode::serialize(&sks).unwrap().len()
);
set_server_key(sks);
let clear_a = 12u8;
let a = FheUint8::try_encrypt(clear_a, &cks).unwrap();
let c = a + 234u8;
let decrypted: u8 = c.decrypt(&cks);
assert_eq!(decrypted, clear_a.wrapping_add(234));
}
I expected to see this happen: running the doctest with the following command should work (note that we modify the release profile to have lto = "fat" enabled):
RUSTFLAGS="-C target-cpu=native" cargo +nightly-2023-10-17 test --profile release --doc --features=aarch64-unix,boolean,shortint,integer,internal-keycache -p tfhe -- test_user_docs::how_to_compress
Instead, this happened: the program crashes, compiling the same code in a separate example and the same cargo configuration results in an executable that works. Turning LTO off also makes a doctest that compiles properly, indicating LTO is at fault or part of the problem when combined with doctests.
It has been happening randomly for doctests on a lot of Rust versions but we could not identify what the issue was, looks like enabling LTO creates a miscompile where a value that is provably 0 (as it's never modified by the code) is asserted to be != 0 and crashes the program, sometimes different things error out, it looks like the program is reading at the wrong location on the stack. The value being asserted != 0 is in https://github.com/zama-ai/tfhe-rs/blob/f1c21888a762ddf9de017ae52dc120c141ec9c02/tfhe/src/core_crypto/algorithms/ggsw_encryption.rs#L551
Unfortunately we are not able to minify this issue at the moment as it's not happening reliably across doctests.
Meta
rustc --version --verbose
:
rustc 1.75.0-nightly (49691b1f7 2023-10-16)
binary: rustc
commit-hash: 49691b1f70d71dd7b8349c332b7f277ee527bf08
commit-date: 2023-10-16
host: aarch64-apple-darwin
release: 1.75.0-nightly
LLVM version: 17.0.2
Unfortunately on nightly (used to recover the doctest binaries via RUSTDOCFLAGS="-Z unstable-options --persist-doctests doctestbins") only exhibits the crash for the parallel version of an encryption algorithm used with rayon (on current stable we can also get the crash with a serial algorithm but we don't seem to be able to recover the doctest binary).
doctest_miscompile.zip
The archive contains the objdump --disassemble
for the code compiled as an example (running fine) and the code compiled as a doctest exhibiting the miscompilation, if needed I can provide the binaries, but I would understand if nobody would want to run a binary coming from a bug report.
objdump --version
Apple LLVM version 14.0.3 (clang-1403.0.22.14.1)
Optimized build.
Default target: arm64-apple-darwin22.5.0
Host CPU: apple-m1l
Registered Targets:
aarch64 - AArch64 (little endian)
aarch64_32 - AArch64 (little endian ILP32)
aarch64_be - AArch64 (big endian)
arm - ARM
arm64 - ARM64 (little endian)
arm64_32 - ARM64 (little endian ILP32)
armeb - ARM (big endian)
thumb - Thumb
thumbeb - Thumb (big endian)
x86 - 32-bit X86: Pentium-Pro and above
x86-64 - 64-bit X86: EM64T and AMD64
Here is a snippet of a backtrace with two threads erroring on two different issues (while there is no problem having the same code compiled as an example).
Backtrace
stack backtrace:
0: 0x102712f6c - thread '<<unnamed>std' panicked at tfhe/src/core_crypto/algorithms/ggsw_encryption.rs:551:::5sys_common:
::assertion failed: ciphertext_modulus.is_compatible_with_native_modulus()backtrace
::_print::DisplayBacktrace as core::fmt::Display>::fmt::h06ea57ce7b13512d
1: 0x10268b4f8 - core::fmt::write::h4d15d254ca20c331
2: 0x1026c6a68 - std::io::Write::write_fmt::hfdc8b2852a9a03fa
3: 0x102715ea0 - std::sys_common::backtrace::print::h139bbaa51f48014c
4: 0x102715a08 - std::panicking::default_hook::{{closure}}::hbbb7d85a61092397
5: 0x1027157cc - std::panicking::default_hook::hb0db088803baef11
6: 0x102717234 - std::panicking::rust_panic_with_hook::h78dc274574606137
7: 0x102716da8 - std::panicking::begin_panic_handler::{{closure}}::h2905be29dbe9281c
8: 0x102716c88 - std::sys_common::backtrace::__rust_end_short_backtrace::h2a15f4fd2d64df91
9: 0x102716c7c - _rust_begin_unwind
10: 0x1027fe624 - core::panicking::panic_fmt::hd8e61ff6f38230f9
11: 0x1027fe7b0 - core::panicking::panic::h4a945e52b5fb1050
12: 0x1027990bc - tfhe::core_crypto::algorithms::glwe_encryption::encrypt_seeded_glwe_ciphertext_assign_with_existing_generator::hb32b93df2aa13c6e
13: 0x1027d8d44 - <rayon::iter::for_each::ForEachConsumer<F> as rayon::iter::plumbing::Folder<T>>::consume_iter::h6b9d6bce496a26b2
14: 0x10277099c - rayon::iter::plumbing::Producer::fold_with::h3252c105ae5580f0
15: 0x10278c92c - rayon::iter::plumbing::bridge_producer_consumer::helper::h516df06807eeed76
16: 0x10271ff70 - rayon_core::join::join_context::{{closure}}::h7ecf44f403b2e94c
17: 0x102729d00 - rayon_core::registry::in_worker::hb2d005d9f62ec9b8
18: 0x10278c918 - rayon::iter::plumbing::bridge_producer_consumer::helper::h516df06807eeed76
19: 0x102792d0c - <<rayon::iter::map::Map<I,F> as rayon::iter::IndexedParallelIterator>::with_producer::Callback<CB,F> as rayon::iter::plumbing::ProducerCallback<T>>::callback::h282ea6fb42ca6c2b
20: 0x10276aaa0 - <<rayon::iter::zip::Zip<A,B> as rayon::iter::IndexedParallelIterator>::with_producer::CallbackB<CB,A> as rayon::iter::plumbing::ProducerCallback<ITEM>>::callback::h6c6ab19b4791d17e
21: 0x1027dcc88 - <<rayon::iter::enumerate::Enumerate<I> as rayon::iter::IndexedParallelIterator>::with_producer::Callback<CB> as rayon::iter::plumbing::ProducerCallback<I>>::callback::h62504345ff3d393a
22: 0x10278f38c - rayon::iter::plumbing::bridge::h142cac5b932df279
23: 0x1027de84c - rayon::iter::plumbing::Producer::fold_with::hda6c429fb67861a6
24: 0x10278b204 - rayon::iter::plumbing::bridge_producer_consumer::helper::ha97da0be53d3520b
25: 0x1027930fc - <<rayon::iter::map::Map<I,F> as rayon::iter::IndexedParallelIterator>::with_producer::Callback<CB,F> as rayon::iter::plumbing::ProducerCallback<T>>::callback::h5caece096ea77aa2
26: 0x102768cdc - <<rayon::iter::zip::Zip<A,B> as rayon::iter::IndexedParallelIterator>::with_producer::CallbackA<CB,B> as rayon::iter::plumbing::ProducerCallback<ITEM>>::callback::h9c59859a5ada9da8
27: 0x102790548 - rayon::iter::plumbing::bridge::h691ef483cd06a966
28: 0x1027d896c - tfhe::core_crypto::algorithms::ggsw_encryption::par_encrypt_constant_seeded_ggsw_ciphertext_with_existing_generator::h1092854bcdddc1c5
29: 0x1027d8540 - <rayon::iter::for_each::ForEachConsumer<F> as rayon::iter::plumbing::Folder<T>>::consume_iter::h58460779da245a1d
30: 0x102771604 - rayon::iter::plumbing::Producer::fold_with::h5c2dab692eefc651
31: 0x10278a424 - rayon::iter::plumbing::bridge_producer_consumer::helper::hd7e30ce6b8c8fdf8
32: 0x102759bec - <rayon_core::job::StackJob<L,F,R> as rayon_core::job::Job>::execute::he14a52c10f982320
33: 0x1027ff980 - rayon_core::registry::WorkerThread::wait_until_cold::hadf889fe03869109
34: 0x10271ec34 - rayon_core::join::join_context::{{closure}}::h6ff07f0ad22d988f
35: 0x1027292dc - rayon_core::registry::in_worker::h72ac659d0872c7bc
36: 0x10278a410 - rayon::iter::plumbing::bridge_producer_consumer::helper::hd7e30ce6b8c8fdf8
37: 0x102759bec - <rayon_core::job::StackJob<L,F,R> as rayon_core::job::Job>::execute::he14a52c10f982320
38: 0x1027ff980 - rayon_core::registry::WorkerThread::wait_until_cold::hadf889fe03869109
39: 0x10280004c - rayon_core::join::join_recover_from_panic::hac430d1fb14e684b
40: 0x10271eb10 - rayon_core::join::join_context::{{closure}}::h6ff07f0ad22d988f
41: 0x1027292dc - rayon_core::registry::in_worker::h72ac659d0872c7bc
42: 0x10278a410 - rayon::iter::plumbing::bridge_producer_consumer::helper::hd7e30ce6b8c8fdf8
43: 0x10271eac8 - rayon_core::join::join_context::{{closure}}::h6ff07f0ad22d988f
44: 0x1027292dc - rayon_core::registry::in_worker::h72ac659d0872c7bc
45: 0x10278a410 - rayon::iter::plumbing::bridge_producer_consumer::helper::hd7e30ce6b8c8fdf8
46: 0x1027306d4 - rayon_core::join::join_context::{{closure}}::h6ff07f0ad22d988f
47: 0x102750400 - <rayon_core::job::StackJob<L,F,R> as rayon_core::job::Job>::execute::h5752c5eaefb098bd
48: 0x1027ff980 - rayon_core::registry::WorkerThread::wait_until_cold::hadf889fe03869109
49: 0x1026a9300 - rayon_core::registry::ThreadBuilder::run::h03f0186f2f91b865
50: 0x1026b1ee4 - std::sys_common::backtrace::__rust_begin_short_backtrace::hf857650a9dcd5e44
51: 0x1026ac8c8 - core::ops::function::FnOnce::call_once{{vtable.shim}}::heab0ff5ef27f89d0
52: 0x1027183c4 - std::sys::unix::thread::Thread::new::thread_start::h2ab8753089ede7d0
53: 0x19832bfa8 - __pthread_joiner_wake
stack backtrace:
0: 0x102712f6c - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h06ea57ce7b13512d
1: 0x10268b4f8 - core::fmt::write::h4d15d254ca20c331
2: 0x1026c6a68 - std::io::Write::write_fmt::hfdc8b2852a9a03fa
3: 0x102715ea0 - std::sys_common::backtrace::print::h139bbaa51f48014c
4: 0x102715a08 - std::panicking::default_hook::{{closure}}::hbbb7d85a61092397
5: 0x1027157cc - std::panicking::default_hook::hb0db088803baef11
6: 0x102717234 - std::panicking::rust_panic_with_hook::h78dc274574606137
7: 0x102716da8 - std::panicking::begin_panic_handler::{{closure}}::h2905be29dbe9281c
8: 0x102716c88 - std::sys_common::backtrace::__rust_end_short_backtrace::h2a15f4fd2d64df91
9: 0x102716c7c - _rust_begin_unwind
10: thread ' <unnamed> ' panicked at /rustc/49691b1f70d71dd7b8349c332b7f277ee527bf08/library/core/src/num/mod.rs : 1166 :0x51027fe624:
- attempt to calculate the remainder with a divisor of zerocore
::panicking::panic_fmt::hd8e61ff6f38230f9
11: 0x1027fe7b0 - core::panicking::panic::h4a945e52b5fb1050
12: 0x1027990bc - tfhe::core_crypto::algorithms::glwe_encryption::encrypt_seeded_glwe_ciphertext_assign_with_existing_generator::hb32b93df2aa13c6e
13: 0x1027d8d44 - <rayon::iter::for_each::ForEachConsumer<F> as rayon::iter::plumbing::Folder<T>>::consume_iter::h6b9d6bce496a26b2
14: 0x10277099c - rayon::iter::plumbing::Producer::fold_with::h3252c105ae5580f0
15: 0x10278c92c - rayon::iter::plumbing::bridge_producer_consumer::helper::h516df06807eeed76
16: 0x102756c50 - <rayon_core::job::StackJob<L,F,R> as rayon_core::job::Job>::execute::hb4b2cce923b187bc
17: 0x1027ff980 - rayon_core::registry::WorkerThread::wait_until_cold::hadf889fe03869109
18: 0x10280004c - rayon_core::join::join_recover_from_panic::hac430d1fb14e684b
19: 0x10271eb10 - rayon_core::join::join_context::{{closure}}::h6ff07f0ad22d988f
20: 0x1027292dc - rayon_core::registry::in_worker::h72ac659d0872c7bc
21: 0x10278a410 - rayon::iter::plumbing::bridge_producer_consumer::helper::hd7e30ce6b8c8fdf8
22: 0x102759bec - <rayon_core::job::StackJob<L,F,R> as rayon_core::job::Job>::execute::he14a52c10f982320
23: 0x1027ff980 - rayon_core::registry::WorkerThread::wait_until_cold::hadf889fe03869109
24: 0x10280004c - rayon_core::join::join_recover_from_panic::hac430d1fb14e684b
25: 0x10271eb10 - rayon_core::join::join_context::{{closure}}::h6ff07f0ad22d988f
26: 0x1027292dc - rayon_core::registry::in_worker::h72ac659d0872c7bc
27: 0x10278a410 - rayon::iter::plumbing::bridge_producer_consumer::helper::hd7e30ce6b8c8fdf8
28: 0x102759bec - <rayon_core::job::StackJob<L,F,R> as rayon_core::job::Job>::execute::he14a52c10f982320
29: 0x1027ff980 - rayon_core::registry::WorkerThread::wait_until_cold::hadf889fe03869109
30: 0x1026a9300 - rayon_core::registry::ThreadBuilder::run::h03f0186f2f91b865
31: 0x1026b1ee4 - std::sys_common::backtrace::__rust_begin_short_backtrace::hf857650a9dcd5e44
32: 0x1026ac8c8 - core::ops::function::FnOnce::call_once{{vtable.shim}}::heab0ff5ef27f89d0
33: 0x1027183c4 - std::sys::unix::thread::Thread::new::thread_start::h2ab8753089ede7d0
34: 0x19832bfa8 - __pthread_joiner_wake
We have also seen some flaky doctests on x86_64 and could not narrow down the issue, we have turned off LTO for all of our doctests for now and we will monitor how things evolve, the reason for the suspicion of an issue on x86 as well is that M1 builds have been running with LTO off for months and have never exhibited the flaky doctest we saw on x86_64, though given the compiled code in that case is significantly different (intrinsics usage being one factor) we can't yet be sure a similar issue is happening on x86_64.
Cheers