Prevent futex_wait from actually waiting if a concurrent waker was executed before us #2228

cbeuw · 2022-06-11T19:52:29Z

Two SC fences were placed in futex_wake (after the caller has changed addr), and in futex_wait (before we read addr). This guarantees that futex_wait sees the value written to addr before the last futex_wake call, should one exists, and avoid going into sleep with no one else to wake us up.

miri/src/concurrency/weak_memory.rs

Lines 324 to 326 in ada7b72

    
           // The current load, which may be sequenced-after an SC fence, can only read-from 
        
           // the last store sequenced-before an SC fence in another thread (or any stores 
        
           // later than that SC fence)

Earlier I proposed to use fetch_add(0) to read the latest value in MO, though this isn't the proper way to do it and breaks aliasing: syscall caller may pass in a *const from a & and Miri complains about write to a SharedReadOnly location, causing this test to fail.

miri/tests/pass/concurrency/linux-futex.rs

Lines 56 to 68 in ada7b72

    
           let futex: i32 = 123; 
        
           // Only wait if the futex value is 456. 
        
           unsafe { 
        
               assert_eq!(libc::syscall( 
        
                   libc::SYS_futex, 
        
                   &futex as *const i32, 
        
                   libc::FUTEX_WAIT, 
        
                   456, 
        
                   ptr::null::<libc::timespec>(), 
        
               ), -1); 
        
               assert_eq!(*libc::__errno_location(), libc::EAGAIN); 
        
           }

RalfJung · 2022-06-13T04:53:15Z

src/shims/unix/linux/sync.rs

+            // Together with the SeqCst fence in futex_wait, this makes sure that futex_wait
+            // will see the latest value on addr which could be changed by our caller
+            // before doing the syscall.
+            this.atomic_fence(&[], AtomicFenceOp::SeqCst)?;


So the issue is that previously, futex_wake didn't actually do anything on addr, so it was not ordered wrt other operations on addr -- and this extra fence here inserts it into the SeqCst order?

futex wake syscall is not supposed to do anything on addr. The API user is supposed to change its value before invoking the wake syscall.

When releasing the lock, a thread has to first reset the lock state to not acquired and then execute a futex operation

https://man7.org/linux/man-pages/man2/futex.2.html

Okay so what is the correctness argument here for a racing wait and wake? We have two SC fences, so the thing weare allowed to do is case distinction on which fence comes first.

In case the wake fence goes first, the wait fence will establish a happens-before from that fence, so the read in wait will definitely pick up all writes before the wake. Therefore it will not sleep and we are good.

In case the wait fence goes first... then what? Since all of wait is atomic and cannot be preempted, we know the waiting thread will be in the futex wait set, so the waking thread will see and wake it. But this relies on all of wait (fence, read, add to wait set) being non-preemptible.

Yes your reasoning are right for both cases. Except for that the fences don't actually establish happens-before, it's just a special property unique to SC fence (cannot be replicated with acquire/release fences):

T1: T2: na = 10 M.store(42, Ordering.any) // A fence(sc) // X fence(sc) //Y M.load(Ordering.any) // B r = na // still a data race, does not happen after na = 10

The load will read 42 as long as T2's fence and load are interleaved after T1's fence, but it does not establish HB with anything, though this doesn't matter for our case.

For atomic operations A and B on an atomic object M, where A modifies M and B takes its value, if there are memory_order_seq_cst fences X and Y such that A is sequenced before X, Y is sequenced before B, and X precedes Y in S, then B observes either the effects of A or a later modification of M in its modification order.

Oh wow, that is wild. So it's a "definitely visible but not happens-before" edge? I have no idea how that makes any sense but it's good enough for our purposes here.

Can you please add a comment with this correctness argument to the code?

Also, just to be sure I understand -- earlier you said "It doesn't matter if there is a preemption between the fence and read", but in my correctness argument we crucially use that fence+read are atomic. So it does matter, after all?

You are right, the fence+read also have to be atomic. I've added the correctness argument as comments

RalfJung · 2022-06-13T04:55:27Z

src/shims/unix/linux/sync.rs

            let futex_val = this
                .read_scalar_at_offset_atomic(
                    &addr.into(),
                    0,
                    this.machine.layouts.i32,
-                    AtomicReadOp::SeqCst,
+                    AtomicReadOp::Relaxed,
                )?
                .to_i32()?;


I don't think I understand why SeqCst fence + relaxed access is more correct than SeqCst access. Both should get inserted into the total SeqCst order, no?

OTOH, I seem to remember my weak memory colleagues saying that when mixing multiple orderings on one location, SeqCst accesses become ill-behaved very quickly, and only SeqCst fences still make sense.

Also, if this relies on the fact that the fence+load happen atomically and cannot be preempted, please add a comment explaining that.

We do not care about the synchronisation effect of accessing addr has on other memory locations; this is done by the validate_lock_acquire call in this.futex_wait(). Futex API implementations are not allowed to write to addr since it's "owned" by the user program, so I doubt it's possible to create synchronisation using addr anyway.

The sole location we want the fence to affect is addr itself, therefore the ordering for the read can be anything (though it still has to be atomic because the fences don't actually create an HB edge, it's just a special case of SC fence guaranteeing what the next atomic read can see).

Of course, the API users can do extra accesses on addr and expect some ordering guarantees, but having an extra relaxed read here should not weaken it, otherwise it's a very surprising behaviour and a bug in the memory order.

It doesn't matter if there is a preemption between the fence and read. We must not preempt between the read and comparison.

Futex API implementations are not allowed to write to addr since it's "owned" by the user program, so I doubt it's possible to create synchronisation using addr anyway.

That is a good argument.

It doesn't matter if there is a preemption between the fence and read. We must not preempt between the read and comparison.

The comparison is completely thread-local, so I don't understand this comment. Or do you mean that it is important that the thread be put to sleep atomically with the read?

Or do you mean that it is important that the thread be put to sleep atomically with the read?

Yes, otherwise there's a TOCTOU bug

RalfJung · 2022-06-13T04:56:55Z

tests/pass/concurrency/linux-futex.rs

+const HELD: i32 = 1;
+fn concurrent_wait_wake() {
+    static FUTEX: AtomicI32 = AtomicI32::new(0);
+    for _ in 0..100 {


100 times is a lot.

What about reducing this (to make the test not take quite as long), and bumping up the preemption rate for this test?

FWIW, the test currently already fails in the 3rd iteration.

Is this even about the preemption rate? If it is about picking 1 out of 2 entries in the store buffer, then 20 rounds make the probability of accidentally passing the test 2^-20, which seems easily good enough.

We do need to get the wake to run before the wait to trigger this bug, but yeah 20 iterations should be enough

We do need to get the wake to run before the wait

So preemption is actually harmful? Should we set the preemption rate to 0?

I think leaving it as-is is fine. Having preemption also demonstrates that the critical sections inside futex syscalls are happening atomically

src/shims/unix/linux/sync.rs

RalfJung · 2022-06-18T15:00:01Z

Thanks for fixing this nasty issue!
@bors r+

bors · 2022-06-18T15:00:03Z

📌 Commit 737a5b3 has been approved by RalfJung

bors · 2022-06-18T15:10:58Z

⌛ Testing commit 737a5b3 with merge c4dd3f4...

bors · 2022-06-18T15:33:11Z

☀️ Test successful - checks-actions
Approved by: RalfJung
Pushing c4dd3f4 to master...

Prevent futex_wait from reading outdated value

b6bcbf7

RalfJung reviewed Jun 13, 2022

View reviewed changes

cbeuw added 2 commits June 13, 2022 18:24

Reduce the number of iterations

56a4c13

Elaborate correctness comments

807a19a

RalfJung reviewed Jun 18, 2022

View reviewed changes

src/shims/unix/linux/sync.rs Outdated Show resolved Hide resolved

RalfJung reviewed Jun 18, 2022

View reviewed changes

src/shims/unix/linux/sync.rs Outdated Show resolved Hide resolved

RalfJung reviewed Jun 18, 2022

View reviewed changes

src/shims/unix/linux/sync.rs Outdated Show resolved Hide resolved

tweak correctness comment

737a5b3

bors merged commit c4dd3f4 into rust-lang:master Jun 18, 2022

cbeuw deleted the futex-fix branch June 18, 2022 15:36

cbeuw mentioned this pull request Jun 18, 2022

Windows thread support: Part 1 #2231

Merged

	// The current load, which may be sequenced-after an SC fence, can only read-from
	// the last store sequenced-before an SC fence in another thread (or any stores
	// later than that SC fence)

	let futex: i32 = 123;

	// Only wait if the futex value is 456.
	unsafe {
	assert_eq!(libc::syscall(
	libc::SYS_futex,
	&futex as *const i32,
	libc::FUTEX_WAIT,
	456,
	ptr::null::<libc::timespec>(),
	), -1);
	assert_eq!(*libc::__errno_location(), libc::EAGAIN);
	}

Prevent futex_wait from actually waiting if a concurrent waker was executed before us #2228

Prevent futex_wait from actually waiting if a concurrent waker was executed before us #2228

Uh oh!

Conversation

cbeuw commented Jun 11, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RalfJung Jun 13, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cbeuw Jun 13, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RalfJung Jun 13, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cbeuw Jun 13, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

RalfJung commented Jun 18, 2022

Uh oh!

bors commented Jun 18, 2022

Uh oh!

bors commented Jun 18, 2022

Uh oh!

bors commented Jun 18, 2022

Uh oh!

Uh oh!

cbeuw commented Jun 11, 2022 •

edited

Loading

RalfJung Jun 13, 2022 •

edited

Loading

cbeuw Jun 13, 2022 •

edited

Loading

RalfJung Jun 13, 2022 •

edited

Loading

cbeuw Jun 13, 2022 •

edited

Loading