Skip to content

Crash, invalid free in monterey #107929

Closed
Closed
@kali

Description

@kali

The following code, compiled optimized on MacOS Monterey, x64, will crash with an invalid free message.

Copy as a monterey-crasher.rs file.

#![allow(dead_code)]
#[derive(Copy, Clone)]
enum BinOp {
    Min,
}
#[derive(Clone, Copy)]
enum OutputStoreSpec {
    View(usize),
    Strides([isize; 5])
}
#[derive(Clone)]
enum AttrOrInput {
    Attr(Box<()>),
    Input(usize),
}
#[derive(Clone)]
enum ProtoFusedSpec {
    BinScalar(AttrOrInput, BinOp),
    BinPerRow(AttrOrInput, BinOp),
    BinPerCol(AttrOrInput, BinOp),
    AddRowColProducts(AttrOrInput, AttrOrInput),
    AddUnicast(OutputStoreSpec, AttrOrInput),
    Store,
}
fn main() {
    let mut stuff = vec!(vec!(1));
    for i in 0..50000 {
        let len = (stuff[i].len() * 134775813) % 4096;
        stuff.push((1234123414u32..).take(len).collect());
    }
    std::mem::drop(stuff);
    let _ = vec!((Box::new(()), vec![ProtoFusedSpec::Store])).as_slice().to_owned();
}

This shell script will loop 100 times over the generated executable, and will likely crash in the first couple of runs.

#!/bin/sh

set -e

rustc -C opt-level=3 monterey-crasher.rs -o monterey-crasher
for i in `seq 1 100`
do
    echo $i
    ./monterey-crasher
done

Meta

Reproducible with any stable version since 1.65. As far as we can tell it is a regression that appeared first with 512bd84

Output

[...]
1
monterey-crasher(80086,0x113e0d600) malloc: *** error for object 0x600001850d80: pointer being freed was not allocated
monterey-crasher(80086,0x113e0d600) malloc: *** set a breakpoint in malloc_error_break to debug

Crash stack trace In LLDB:

* thread #1, name = 'main', queue = 'com.apple.main-thread', stop reason = signal SIGABRT
  * frame #0: 0x00007ff80c60f00e libsystem_kernel.dylib`__pthread_kill + 10
    frame #1: 0x00007ff80c6451ff libsystem_pthread.dylib`pthread_kill + 263
    frame #2: 0x00007ff80c590d24 libsystem_c.dylib`abort + 123
    frame #3: 0x00007ff80c46e357 libsystem_malloc.dylib`malloc_vreport + 551
    frame #4: 0x00007ff80c47152b libsystem_malloc.dylib`malloc_report + 151
    frame #5: 0x00000001000024c6 monterey-crasher`monterey_crasher::main::h6d331bef051cd6b6 + 742
    frame #6: 0x0000000100002046 monterey-crasher`std::sys_common::backtrace::__rust_begin_short_backtrace::h1f33f2adc2752b09 + 6
    frame #7: 0x000000010000201c monterey-crasher`std::rt::lang_start::_$u7b$$u7b$closure$u7d$$u7d$::h05acf7e35f5cbc7f + 12
    frame #8: 0x000000010001c2a4 monterey-crasher`std::rt::lang_start_internal::hd56d2fa7efb2dd60 [inlined] core::ops::function::impls::_$LT$impl$u20$core..ops..function..FnOnce$LT$A$GT$$u20$for$u20$$RF$F$GT$::call_once::h2302f1d25ef2ca9b at function.rs:606:13 [opt]
    frame #9: 0x000000010001c2a1 monterey-crasher`std::rt::lang_start_internal::hd56d2fa7efb2dd60 [inlined] std::panicking::try::do_call::h6695e32a593de2cc at panicking.rs:483:40 [opt]
    frame #10: 0x000000010001c2a1 monterey-crasher`std::rt::lang_start_internal::hd56d2fa7efb2dd60 [inlined] std::panicking::try::hd4a93095627721a9 at panicking.rs:447:19 [opt]
    frame #11: 0x000000010001c2a1 monterey-crasher`std::rt::lang_start_internal::hd56d2fa7efb2dd60 [inlined] std::panic::catch_unwind::he41b3dba63feca94 at panic.rs:137:14 [opt]
    frame #12: 0x000000010001c2a1 monterey-crasher`std::rt::lang_start_internal::hd56d2fa7efb2dd60 [inlined] std::rt::lang_start_internal::_$u7b$$u7b$closure$u7d$$u7d$::hbf45583011495a61 at rt.rs:148:48 [opt]
    frame #13: 0x000000010001c2a1 monterey-crasher`std::rt::lang_start_internal::hd56d2fa7efb2dd60 [inlined] std::panicking::try::do_call::ha3e6b3edab7da449 at panicking.rs:483:40 [opt]
    frame #14: 0x000000010001c2a1 monterey-crasher`std::rt::lang_start_internal::hd56d2fa7efb2dd60 [inlined] std::panicking::try::hd4e0f354bf7022b9 at panicking.rs:447:19 [opt]
    frame #15: 0x000000010001c2a1 monterey-crasher`std::rt::lang_start_internal::hd56d2fa7efb2dd60 [inlined] std::panic::catch_unwind::h1035b163871a4269 at panic.rs:137:14 [opt]
    frame #16: 0x000000010001c2a1 monterey-crasher`std::rt::lang_start_internal::hd56d2fa7efb2dd60 at rt.rs:148:20 [opt]
    frame #17: 0x000000010000260c monterey-crasher`main + 44
    frame #18: 0x000000010007952e dyld`start + 462

Notes

  • Bug was discovered in tract, in conjuction with a big pile of code from ndarray. It took a huge amount of effort reducing, some manual, some semi-automatic to obtain a small test case without unsafe code (which was innocent). tract contribution to the test-case is the ProtoFusedSpec enumeration. ndarray main contribution to the issue is the .as_slice().to_owned() bit. One of the breakthrough was realizing that most of the remaining code was actually just putting non-zero bits in memory. At that point we could remove most of the remaining stuff and replace it by the pseudo-random allocation at the beginning of the main.
  • We don't know if it reproduces on arm64 monterey (not tried, could not find a machine). We could only reproduce on x86-64 Monterey, not arm64 Monterey, not Ventura.
  • rustc commit (512bd84) obtained by bisecting points to something related to enumeration and niche discriminant optimisation. We checked the layout, dumped the ProtoFusedSpec structure without finding anything suspect. @lqd also dumped the rustc internal structure for the enumeration representation https://gist.github.com/lqd/bb93888ee24540072141afd6b93df6f3 . The gist comes with the test-case variant at the time of dumping, we were able to reduce it more since (including alteration to the actual problematic enumeration).
  • Various sanitizing tools have failed us. Varnish and address sanitizer, as well as XCode MallocGard were unable to come with anything more interesting than the LLDB stack trace.
  • We also tried instrumenting rustc global allocator to try to figure out if the address was actually invalid. But the bug is very elusive. We could not reproduce it with the instrumentation.
  • What's specific with Monterey ? Apparently, MacOS Monterey has two variants of the system allocator. The MallocNanoZone environment variable seems to control which variant is used. Many applications have run into problems with the default choice and set MallocNanoZone=0 in the environment as a workaround. VScode actually does it: the bug does not appear in its terminal (unless running with env -i to discard the switch).

Metadata

Metadata

Assignees

No one assigned

    Labels

    C-bugCategory: This is a bug.I-crashIssue: The compiler crashes (SIGSEGV, SIGABRT, etc). Use I-ICE instead when the compiler panics.O-macosOperating system: macOS

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions