Skip to content

Prevent futex_wait from actually waiting if a concurrent waker was executed before us #2228

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jun 18, 2022

Conversation

cbeuw
Copy link
Contributor

@cbeuw cbeuw commented Jun 11, 2022

Fixes #2223

Two SC fences were placed in futex_wake (after the caller has changed addr), and in futex_wait (before we read addr). This guarantees that futex_wait sees the value written to addr before the last futex_wake call, should one exists, and avoid going into sleep with no one else to wake us up.

// The current load, which may be sequenced-after an SC fence, can only read-from
// the last store sequenced-before an SC fence in another thread (or any stores
// later than that SC fence)

Earlier I proposed to use fetch_add(0) to read the latest value in MO, though this isn't the proper way to do it and breaks aliasing: syscall caller may pass in a *const from a & and Miri complains about write to a SharedReadOnly location, causing this test to fail.

let futex: i32 = 123;
// Only wait if the futex value is 456.
unsafe {
assert_eq!(libc::syscall(
libc::SYS_futex,
&futex as *const i32,
libc::FUTEX_WAIT,
456,
ptr::null::<libc::timespec>(),
), -1);
assert_eq!(*libc::__errno_location(), libc::EAGAIN);
}

// Together with the SeqCst fence in futex_wait, this makes sure that futex_wait
// will see the latest value on addr which could be changed by our caller
// before doing the syscall.
this.atomic_fence(&[], AtomicFenceOp::SeqCst)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the issue is that previously, futex_wake didn't actually do anything on addr, so it was not ordered wrt other operations on addr -- and this extra fence here inserts it into the SeqCst order?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

futex wake syscall is not supposed to do anything on addr. The API user is supposed to change its value before invoking the wake syscall.

When releasing the lock, a thread has to first reset the lock state to not acquired and then execute a futex operation

https://man7.org/linux/man-pages/man2/futex.2.html

Copy link
Member

@RalfJung RalfJung Jun 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay so what is the correctness argument here for a racing wait and wake? We have two SC fences, so the thing weare allowed to do is case distinction on which fence comes first.

  • In case the wake fence goes first, the wait fence will establish a happens-before from that fence, so the read in wait will definitely pick up all writes before the wake. Therefore it will not sleep and we are good.
  • In case the wait fence goes first... then what? Since all of wait is atomic and cannot be preempted, we know the waiting thread will be in the futex wait set, so the waking thread will see and wake it. But this relies on all of wait (fence, read, add to wait set) being non-preemptible.

Copy link
Contributor Author

@cbeuw cbeuw Jun 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes your reasoning are right for both cases. Except for that the fences don't actually establish happens-before, it's just a special property unique to SC fence (cannot be replicated with acquire/release fences):

T1:                                                 T2:
na = 10
M.store(42, Ordering.any)  // A           
fence(sc) // X
                                                     fence(sc) //Y
                                                     M.load(Ordering.any) // B
                                                     r = na // still a data race, does not happen after na = 10

The load will read 42 as long as T2's fence and load are interleaved after T1's fence, but it does not establish HB with anything, though this doesn't matter for our case.

For atomic operations A and B on an atomic object M, where A modifies M and B takes its value, if there are memory_order_seq_cst fences X and Y such that A is sequenced before X, Y is sequenced before B, and X precedes Y in S, then B observes either the effects of A or a later modification of M in its modification order.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wow, that is wild. So it's a "definitely visible but not happens-before" edge? I have no idea how that makes any sense but it's good enough for our purposes here.

Can you please add a comment with this correctness argument to the code?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, just to be sure I understand -- earlier you said "It doesn't matter if there is a preemption between the fence and read", but in my correctness argument we crucially use that fence+read are atomic. So it does matter, after all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, the fence+read also have to be atomic. I've added the correctness argument as comments

let futex_val = this
.read_scalar_at_offset_atomic(
&addr.into(),
0,
this.machine.layouts.i32,
AtomicReadOp::SeqCst,
AtomicReadOp::Relaxed,
)?
.to_i32()?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I understand why SeqCst fence + relaxed access is more correct than SeqCst access. Both should get inserted into the total SeqCst order, no?

OTOH, I seem to remember my weak memory colleagues saying that when mixing multiple orderings on one location, SeqCst accesses become ill-behaved very quickly, and only SeqCst fences still make sense.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, if this relies on the fact that the fence+load happen atomically and cannot be preempted, please add a comment explaining that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not care about the synchronisation effect of accessing addr has on other memory locations; this is done by the validate_lock_acquire call in this.futex_wait(). Futex API implementations are not allowed to write to addr since it's "owned" by the user program, so I doubt it's possible to create synchronisation using addr anyway.

The sole location we want the fence to affect is addr itself, therefore the ordering for the read can be anything (though it still has to be atomic because the fences don't actually create an HB edge, it's just a special case of SC fence guaranteeing what the next atomic read can see).

Of course, the API users can do extra accesses on addr and expect some ordering guarantees, but having an extra relaxed read here should not weaken it, otherwise it's a very surprising behaviour and a bug in the memory order.

It doesn't matter if there is a preemption between the fence and read. We must not preempt between the read and comparison.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Futex API implementations are not allowed to write to addr since it's "owned" by the user program, so I doubt it's possible to create synchronisation using addr anyway.

That is a good argument.

It doesn't matter if there is a preemption between the fence and read. We must not preempt between the read and comparison.

The comparison is completely thread-local, so I don't understand this comment. Or do you mean that it is important that the thread be put to sleep atomically with the read?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or do you mean that it is important that the thread be put to sleep atomically with the read?

Yes, otherwise there's a TOCTOU bug

const HELD: i32 = 1;
fn concurrent_wait_wake() {
static FUTEX: AtomicI32 = AtomicI32::new(0);
for _ in 0..100 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100 times is a lot.

What about reducing this (to make the test not take quite as long), and bumping up the preemption rate for this test?

Copy link
Member

@RalfJung RalfJung Jun 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, the test currently already fails in the 3rd iteration.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this even about the preemption rate? If it is about picking 1 out of 2 entries in the store buffer, then 20 rounds make the probability of accidentally passing the test 2^-20, which seems easily good enough.

Copy link
Contributor Author

@cbeuw cbeuw Jun 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do need to get the wake to run before the wait to trigger this bug, but yeah 20 iterations should be enough

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do need to get the wake to run before the wait

So preemption is actually harmful? Should we set the preemption rate to 0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think leaving it as-is is fine. Having preemption also demonstrates that the critical sections inside futex syscalls are happening atomically

@RalfJung
Copy link
Member

Thanks for fixing this nasty issue!
@bors r+

@bors
Copy link
Contributor

bors commented Jun 18, 2022

📌 Commit 737a5b3 has been approved by RalfJung

@bors
Copy link
Contributor

bors commented Jun 18, 2022

⌛ Testing commit 737a5b3 with merge c4dd3f4...

@bors
Copy link
Contributor

bors commented Jun 18, 2022

☀️ Test successful - checks-actions
Approved by: RalfJung
Pushing c4dd3f4 to master...

@bors bors merged commit c4dd3f4 into rust-lang:master Jun 18, 2022
@cbeuw cbeuw deleted the futex-fix branch June 18, 2022 15:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Review Linux futex implementation for weak memory
3 participants