Heavy contention on blocking #2528

jonhoo · 2020-05-12T14:21:12Z

Code that invokes spawn_blocking, or block_in_place (which internally calls spawn_blocking), all end up taking a shared lock here:

tokio/tokio/src/runtime/blocking/pool.rs

Line 153 in 221f421

let mut shared = self.inner.shared.lock().unwrap();

This causes a lot of unnecessary contention between unrelated tasks if you frequently need to run blocking code. The problem is exacerbated as load increases, because more calls to spawn_blocking causes each individual call to become more expensive, and each call holds up an executor thread.

The text was updated successfully, but these errors were encountered:

dekellum · 2021-01-03T14:18:06Z

It looks to me like the lock is held for the minimum and correct span (e.g. not including actually running the blocking operation closure). In my own testing of an alternative thread pool (in blocking-permit crate), use of (parking_lot) mutex lock and condvar with minimized notifications appears to give the best possible performance, at least on linux.

What if anything is suggested as an alternative?

SUPERCILEX · 2022-03-02T06:58:29Z

The solution here would be to use a lockless queue, e.g. crossbeam_deque. @Darksonn On that note, why was crossbeam stuff purged from tokio? Every tokio runtime's injector uses locks which seems like a clear downgrade from a lockless queue.

Noah-Kennedy · 2022-03-02T18:30:18Z

I've seen this causing perf issues before on code which does a ton of file io.

SUPERCILEX · 2022-03-03T04:57:58Z

I've spent a bit of time thinking about what the optimal solution could look like:

Use an unbounded FIFO queue for Injector tasks (most likely SegQueue).
Use a bounded (by thread_cap) LIFO queue for idle threads (the point is to try and re-use the same threads). As far as I can tell, this doesn't exist (crossbeam_deque has an unbounded LIFO queue, but it uses a single buffer that needs to be reallocated on overflow which seems like a huge waste). This means we'd have to write our own (probably with a vec guarded by atomics).
Use a slab guarded by a mutex to keep track of alive threads. Not a fan of the mutex, so maybe use a shared slab that can double as the idle thread queue? Need to figure out how that would work.

Tasks are pushed to the SegQueue. Active threads poll the queue, twice if thread_cap has not been reached yet. If the second poll returns a value, start a new thread and give it that second task (putting it in the slab). If the queue is empty, the thread puts itself in the idle thread queue and parks itself with a timeout. New tasks can take a thread out of the idle queue and notify it (assuming it's not dead, otherwise remove from slab and loop). On notify or timeout, a thread checks to see if there's work and lets itself die if not, marking itself as dead. On shutdown, notify all the idle threads and then join everything in the slab.

Darksonn · 2022-03-03T06:33:34Z

We generally want to avoid adding new dependencies if we can at all avoid it.

hawkw · 2022-03-04T00:23:17Z

One nice thing about the injector queue is that --- unlike run queues --- tasks cannot be dropped from outside of the queue while in the injector queue. This means that we may be able to use an approach like Vyukov's intrusive singly-linked MPSC queue for injector queues --- this is something I've wanted to look into for a while, actually.

SUPERCILEX · 2022-03-04T00:53:13Z

Don't we need MPMC though?

tfreiberg-fastly · 2023-06-28T16:01:18Z

I've experienced this in benchmarks with high RPS doing very small reads from files on a ramdisk. Replacing spawn_blocking with block_in_place actually improved things in these cases (obviously, with larger reads block_in_place was unusable. this only confirmed that queueing is in fact an issue at 20k+ RPS)
So i'm very interested in this issue and when it becomes a blocker, i'll try to contribute.

Noah-Kennedy · 2023-06-28T17:43:51Z

I've experienced this in benchmarks with high RPS doing very small reads from files on a ramdisk. Replacing spawn_blocking with block_in_place actually improved things in these cases (obviously, with larger reads block_in_place was unusable. this only confirmed that queueing is in fact an issue at 20k+ RPS) So i'm very interested in this issue and when it becomes a blocker, i'll try to contribute.

block_in_place actually does do basically a spawn_blocking call to obtain a thread for a new worker, so this is interesting.

Darksonn added A-tokio Area: The main tokio crate C-enhancement Category: A PR with an enhancement or bugfix. M-blocking Module: tokio/task/blocking T-performance Topic: performance and benchmarks labels May 12, 2020

SUPERCILEX mentioned this issue Mar 3, 2022

Possible improvements to SegQueue crossbeam-rs/crossbeam#794

Open

tillrohrmann mentioned this issue Jul 5, 2023

Heavy contention on spawn_blocking lock restatedev/restate#562

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Heavy contention on blocking #2528

Heavy contention on blocking #2528

jonhoo commented May 12, 2020

dekellum commented Jan 3, 2021

SUPERCILEX commented Mar 2, 2022

Noah-Kennedy commented Mar 2, 2022

SUPERCILEX commented Mar 3, 2022

Darksonn commented Mar 3, 2022

hawkw commented Mar 4, 2022

SUPERCILEX commented Mar 4, 2022

tfreiberg-fastly commented Jun 28, 2023

Noah-Kennedy commented Jun 28, 2023

Heavy contention on blocking #2528

Heavy contention on blocking #2528

Comments

jonhoo commented May 12, 2020

dekellum commented Jan 3, 2021

SUPERCILEX commented Mar 2, 2022

Noah-Kennedy commented Mar 2, 2022

SUPERCILEX commented Mar 3, 2022

Darksonn commented Mar 3, 2022

hawkw commented Mar 4, 2022

SUPERCILEX commented Mar 4, 2022

tfreiberg-fastly commented Jun 28, 2023

Noah-Kennedy commented Jun 28, 2023