Skip to content

Prevent futex_wait from actually waiting if a concurrent waker was executed before us #2228

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jun 18, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
61 changes: 50 additions & 11 deletions src/shims/unix/linux/sync.rs
Original file line number Diff line number Diff line change
Expand Up @@ -126,28 +126,63 @@ pub fn futex<'tcx>(
Align::from_bytes(4).unwrap(),
CheckInAllocMsg::MemoryAccessTest,
)?;
// There may be a concurrent thread changing the value of addr
// and then invoking the FUTEX_WAKE syscall. It is critical that the
// effects of this and the other thread are correctly observed,
// otherwise we will deadlock.
//
// There are two scenarios to consider:
// 1. If we (FUTEX_WAIT) execute first, we'll push ourselves into
// the waiters queue and go to sleep. They (addr write & FUTEX_WAKE)
// will see us in the queue and wake us up.
// 2. If they (addr write & FUTEX_WAKE) execute first, we must observe
// addr's new value. If we see an outdated value that happens to equal
// the expected val, then we'll put ourselves to sleep with no one to wake us
// up, so we end up with a deadlock. This is prevented by having a SeqCst
// fence inside FUTEX_WAKE syscall, and another SeqCst fence
// below, the atomic read on addr after the SeqCst fence is guaranteed
// not to see any value older than the addr write immediately before
// calling FUTEX_WAKE. We'll see futex_val != val and return without
// sleeping.
//
// Note that the fences do not create any happens-before relationship.
// The read sees the write immediately before the fence not because
// one happens after the other, but is instead due to a guarantee unique
// to SeqCst fences that restricts what an atomic read placed AFTER the
// fence can see. The read still has to be atomic, otherwise it's a data
// race. This guarantee cannot be achieved with acquire-release fences
// since they only talk about reads placed BEFORE a fence - and places
// no restrictions on what the read itself can see, only that there is
// a happens-before between the fences IF the read happens to see the
// right value. This is useless to us, since we need the read itself
// to see an up-to-date value.
//
// The above case distinction is valid since both FUTEX_WAIT and FUTEX_WAKE
// contain a SeqCst fence, therefore inducting a total order between the operations.
// It is also critical that the fence, the atomic load, and the comparison in FUTEX_WAIT
// altogether happen atomically. If the other thread's fence in FUTEX_WAKE
// gets interleaved after our fence, then we lose the guarantee on the
// atomic load being up-to-date; if the other thread's write on addr and FUTEX_WAKE
// call are interleaved after the load but before the comparison, then we get a TOCTOU
// race condition, and go to sleep thinking the other thread will wake us up,
// even though they have already finished.
//
// Thankfully, preemptions cannot happen inside a Miri shim, so we do not need to
// do anything special to guarantee fence-load-comparison atomicity.
this.atomic_fence(&[], AtomicFenceOp::SeqCst)?;
// Read an `i32` through the pointer, regardless of any wrapper types.
// It's not uncommon for `addr` to be passed as another type than `*mut i32`, such as `*const AtomicI32`.
// FIXME: this fails if `addr` is not a pointer type.
// The atomic ordering for futex(https://man7.org/linux/man-pages/man2/futex.2.html):
// "The load of the value of the futex word is an
// atomic memory access (i.e., using atomic machine instructions
// of the respective architecture). This load, the comparison
// with the expected value, and starting to sleep are performed
// atomically and totally ordered with respect to other futex
// operations on the same futex word."
// SeqCst is total order over all operations.
// FIXME: check if this should be changed when weak memory orders are added.
let futex_val = this
.read_scalar_at_offset_atomic(
&addr.into(),
0,
this.machine.layouts.i32,
AtomicReadOp::SeqCst,
AtomicReadOp::Relaxed,
)?
.to_i32()?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I understand why SeqCst fence + relaxed access is more correct than SeqCst access. Both should get inserted into the total SeqCst order, no?

OTOH, I seem to remember my weak memory colleagues saying that when mixing multiple orderings on one location, SeqCst accesses become ill-behaved very quickly, and only SeqCst fences still make sense.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, if this relies on the fact that the fence+load happen atomically and cannot be preempted, please add a comment explaining that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not care about the synchronisation effect of accessing addr has on other memory locations; this is done by the validate_lock_acquire call in this.futex_wait(). Futex API implementations are not allowed to write to addr since it's "owned" by the user program, so I doubt it's possible to create synchronisation using addr anyway.

The sole location we want the fence to affect is addr itself, therefore the ordering for the read can be anything (though it still has to be atomic because the fences don't actually create an HB edge, it's just a special case of SC fence guaranteeing what the next atomic read can see).

Of course, the API users can do extra accesses on addr and expect some ordering guarantees, but having an extra relaxed read here should not weaken it, otherwise it's a very surprising behaviour and a bug in the memory order.

It doesn't matter if there is a preemption between the fence and read. We must not preempt between the read and comparison.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Futex API implementations are not allowed to write to addr since it's "owned" by the user program, so I doubt it's possible to create synchronisation using addr anyway.

That is a good argument.

It doesn't matter if there is a preemption between the fence and read. We must not preempt between the read and comparison.

The comparison is completely thread-local, so I don't understand this comment. Or do you mean that it is important that the thread be put to sleep atomically with the read?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or do you mean that it is important that the thread be put to sleep atomically with the read?

Yes, otherwise there's a TOCTOU bug

if val == futex_val {
// The value still matches, so we block the trait make it wait for FUTEX_WAKE.
// The value still matches, so we block the thread make it wait for FUTEX_WAKE.
this.block_thread(thread);
this.futex_wait(addr_usize, thread, bitset);
// Succesfully waking up from FUTEX_WAIT always returns zero.
Expand Down Expand Up @@ -203,6 +238,10 @@ pub fn futex<'tcx>(
this.write_scalar(Scalar::from_machine_isize(-1, this), dest)?;
return Ok(());
}
// Together with the SeqCst fence in futex_wait, this makes sure that futex_wait
// will see the latest value on addr which could be changed by our caller
// before doing the syscall.
this.atomic_fence(&[], AtomicFenceOp::SeqCst)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the issue is that previously, futex_wake didn't actually do anything on addr, so it was not ordered wrt other operations on addr -- and this extra fence here inserts it into the SeqCst order?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

futex wake syscall is not supposed to do anything on addr. The API user is supposed to change its value before invoking the wake syscall.

When releasing the lock, a thread has to first reset the lock state to not acquired and then execute a futex operation

https://man7.org/linux/man-pages/man2/futex.2.html

Copy link
Member

@RalfJung RalfJung Jun 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay so what is the correctness argument here for a racing wait and wake? We have two SC fences, so the thing weare allowed to do is case distinction on which fence comes first.

  • In case the wake fence goes first, the wait fence will establish a happens-before from that fence, so the read in wait will definitely pick up all writes before the wake. Therefore it will not sleep and we are good.
  • In case the wait fence goes first... then what? Since all of wait is atomic and cannot be preempted, we know the waiting thread will be in the futex wait set, so the waking thread will see and wake it. But this relies on all of wait (fence, read, add to wait set) being non-preemptible.

Copy link
Contributor Author

@cbeuw cbeuw Jun 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes your reasoning are right for both cases. Except for that the fences don't actually establish happens-before, it's just a special property unique to SC fence (cannot be replicated with acquire/release fences):

T1:                                                 T2:
na = 10
M.store(42, Ordering.any)  // A           
fence(sc) // X
                                                     fence(sc) //Y
                                                     M.load(Ordering.any) // B
                                                     r = na // still a data race, does not happen after na = 10

The load will read 42 as long as T2's fence and load are interleaved after T1's fence, but it does not establish HB with anything, though this doesn't matter for our case.

For atomic operations A and B on an atomic object M, where A modifies M and B takes its value, if there are memory_order_seq_cst fences X and Y such that A is sequenced before X, Y is sequenced before B, and X precedes Y in S, then B observes either the effects of A or a later modification of M in its modification order.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wow, that is wild. So it's a "definitely visible but not happens-before" edge? I have no idea how that makes any sense but it's good enough for our purposes here.

Can you please add a comment with this correctness argument to the code?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, just to be sure I understand -- earlier you said "It doesn't matter if there is a preemption between the fence and read", but in my correctness argument we crucially use that fence+read are atomic. So it does matter, after all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, the fence+read also have to be atomic. I've added the correctness argument as comments

let mut n = 0;
for _ in 0..val {
if let Some(thread) = this.futex_wake(addr_usize, bitset) {
Expand Down
44 changes: 44 additions & 0 deletions tests/pass/concurrency/linux-futex.rs
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@ extern crate libc;

use std::mem::MaybeUninit;
use std::ptr;
use std::sync::atomic::AtomicI32;
use std::sync::atomic::Ordering;
use std::thread;
use std::time::{Duration, Instant};

Expand Down Expand Up @@ -206,6 +208,47 @@ fn wait_wake_bitset() {
t.join().unwrap();
}

const FREE: i32 = 0;
const HELD: i32 = 1;
fn concurrent_wait_wake() {
static FUTEX: AtomicI32 = AtomicI32::new(0);
for _ in 0..20 {
// Suppose the main thread is holding a lock implemented using futex...
FUTEX.store(HELD, Ordering::Relaxed);

let t = thread::spawn(move || {
// If this syscall runs first, then we'll be woken up by
// the main thread's FUTEX_WAKE, and all is fine.
//
// If this sycall runs after the main thread's store
// and FUTEX_WAKE, the syscall must observe that
// the FUTEX is FREE != HELD and return without waiting
// or we'll deadlock.
unsafe {
libc::syscall(
libc::SYS_futex,
&FUTEX as *const AtomicI32,
libc::FUTEX_WAIT,
HELD,
ptr::null::<libc::timespec>(),
);
}
});

FUTEX.store(FREE, Ordering::Relaxed);
unsafe {
libc::syscall(
libc::SYS_futex,
&FUTEX as *const AtomicI32,
libc::FUTEX_WAKE,
1,
);
}

t.join().unwrap();
}
}

fn main() {
wake_nobody();
wake_dangling();
Expand All @@ -214,4 +257,5 @@ fn main() {
wait_absolute_timeout();
wait_wake();
wait_wake_bitset();
concurrent_wait_wake();
}