Skip to content

Possible deadlock when a waker is replaced #5429

Closed
@satakuma

Description

@satakuma

Version
1.25.0

Description

There are some types in tokio::sync which keep a shared state behind a mutex and expose methods to modify it through a shared reference. For example: Notify::notify_waiters(&self), Semaphore::add_permits(&self, n: usize).
It is possible to trigger a deadlock with custom wakers backed by ArcWake, which call those methods in their destructors. This can happen because tokio drops a waker while holding the lock to the shared state, and the waker tries to re-enter the lock in its Drop impl. Examples of wakers dropped with the lock held: Notify::notified, Semaphore::acquire.

To trigger a deadlock, you can poll a future twice with two different wakers. At the second poll, the old waker will be replaced and instantly dropped. Examples of deadlocking code:

Notify::notified

Adapted from this test

fn notify_in_drop() {
    use futures::task::ArcWake;
    use std::future::Future;
    use std::sync::Arc;

    let notify = Arc::new(Notify::new());

    struct NotifyOnDrop(Arc<Notify>);

    impl ArcWake for NotifyOnDrop {
        fn wake_by_ref(_arc_self: &Arc<Self>) {}
    }

    impl Drop for NotifyOnDrop {
        fn drop(&mut self) {
            self.0.notify_waiters();
        }
    }

    let mut fut = Box::pin(async {
        notify.notified().await;
    });

    // Second iteration deadlocks.
    for _ in 0..2 {
        let waker = futures::task::waker(Arc::new(NotifyOnDrop(notify.clone())));
        let mut cx = std::task::Context::from_waker(&waker);
        assert!(fut.as_mut().poll(&mut cx).is_pending());
    }
}
Semaphore::acquire
fn release_permits_at_drop() {
    use futures::task::ArcWake;
    use std::future::Future;
    use std::sync::Arc;

    let sem = Arc::new(Semaphore::new(1));

    struct ReleaseOnDrop(Option<OwnedSemaphorePermit>);

    impl ArcWake for ReleaseOnDrop {
        fn wake_by_ref(_arc_self: &Arc<Self>) {}
    }

    let mut fut = Box::pin(async {
        let _permit = sem.acquire().await.unwrap();
    });

    // Second iteration deadlocks.
    for _ in 0..2 {
        let waker = futures::task::waker(Arc::new(ReleaseOnDrop(
            sem.clone().try_acquire_owned().ok(),
        )));
        let mut cx = std::task::Context::from_waker(&waker);
        assert!(fut.as_mut().poll(&mut cx).is_pending());
    }
}

This issue has been partially addressed in the past (#401, PR discussion), but only about calling waker.wake() while holding a lock, not dropping a waker in general.

Simple solution
When an old waker has to be replaced, it can be stored on the side and dropped outside the critical section.

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-tokioArea: The main tokio crateC-bugCategory: This is a bug.E-help-wantedCall for participation: Help is requested to fix this issue.E-mediumCall for participation: Experience needed to fix: Medium / intermediateM-syncModule: tokio/sync

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions