Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Heavy contention on blocking #2528

Open
jonhoo opened this issue May 12, 2020 · 9 comments
Open

Heavy contention on blocking #2528

jonhoo opened this issue May 12, 2020 · 9 comments
Labels
A-tokio Area: The main tokio crate C-enhancement Category: A PR with an enhancement or bugfix. M-blocking Module: tokio/task/blocking T-performance Topic: performance and benchmarks

Comments

@jonhoo
Copy link
Contributor

jonhoo commented May 12, 2020

Code that invokes spawn_blocking, or block_in_place (which internally calls spawn_blocking), all end up taking a shared lock here:

let mut shared = self.inner.shared.lock().unwrap();

This causes a lot of unnecessary contention between unrelated tasks if you frequently need to run blocking code. The problem is exacerbated as load increases, because more calls to spawn_blocking causes each individual call to become more expensive, and each call holds up an executor thread.

@Darksonn Darksonn added A-tokio Area: The main tokio crate C-enhancement Category: A PR with an enhancement or bugfix. M-blocking Module: tokio/task/blocking T-performance Topic: performance and benchmarks labels May 12, 2020
@dekellum
Copy link
Contributor

dekellum commented Jan 3, 2021

It looks to me like the lock is held for the minimum and correct span (e.g. not including actually running the blocking operation closure). In my own testing of an alternative thread pool (in blocking-permit crate), use of (parking_lot) mutex lock and condvar with minimized notifications appears to give the best possible performance, at least on linux.

What if anything is suggested as an alternative?

@SUPERCILEX
Copy link
Contributor

The solution here would be to use a lockless queue, e.g. crossbeam_deque. @Darksonn On that note, why was crossbeam stuff purged from tokio? Every tokio runtime's injector uses locks which seems like a clear downgrade from a lockless queue.

@Noah-Kennedy
Copy link
Contributor

I've seen this causing perf issues before on code which does a ton of file io.

@SUPERCILEX
Copy link
Contributor

I've spent a bit of time thinking about what the optimal solution could look like:

  • Use an unbounded FIFO queue for Injector tasks (most likely SegQueue).
  • Use a bounded (by thread_cap) LIFO queue for idle threads (the point is to try and re-use the same threads). As far as I can tell, this doesn't exist (crossbeam_deque has an unbounded LIFO queue, but it uses a single buffer that needs to be reallocated on overflow which seems like a huge waste). This means we'd have to write our own (probably with a vec guarded by atomics).
  • Use a slab guarded by a mutex to keep track of alive threads. Not a fan of the mutex, so maybe use a shared slab that can double as the idle thread queue? Need to figure out how that would work.

Tasks are pushed to the SegQueue. Active threads poll the queue, twice if thread_cap has not been reached yet. If the second poll returns a value, start a new thread and give it that second task (putting it in the slab). If the queue is empty, the thread puts itself in the idle thread queue and parks itself with a timeout. New tasks can take a thread out of the idle queue and notify it (assuming it's not dead, otherwise remove from slab and loop). On notify or timeout, a thread checks to see if there's work and lets itself die if not, marking itself as dead. On shutdown, notify all the idle threads and then join everything in the slab.

@Darksonn
Copy link
Contributor

Darksonn commented Mar 3, 2022

We generally want to avoid adding new dependencies if we can at all avoid it.

@hawkw
Copy link
Member

hawkw commented Mar 4, 2022

One nice thing about the injector queue is that --- unlike run queues --- tasks cannot be dropped from outside of the queue while in the injector queue. This means that we may be able to use an approach like Vyukov's intrusive singly-linked MPSC queue for injector queues --- this is something I've wanted to look into for a while, actually.

@SUPERCILEX
Copy link
Contributor

Don't we need MPMC though?

@tfreiberg-fastly
Copy link

I've experienced this in benchmarks with high RPS doing very small reads from files on a ramdisk. Replacing spawn_blocking with block_in_place actually improved things in these cases (obviously, with larger reads block_in_place was unusable. this only confirmed that queueing is in fact an issue at 20k+ RPS)
So i'm very interested in this issue and when it becomes a blocker, i'll try to contribute.

@Noah-Kennedy
Copy link
Contributor

I've experienced this in benchmarks with high RPS doing very small reads from files on a ramdisk. Replacing spawn_blocking with block_in_place actually improved things in these cases (obviously, with larger reads block_in_place was unusable. this only confirmed that queueing is in fact an issue at 20k+ RPS) So i'm very interested in this issue and when it becomes a blocker, i'll try to contribute.

block_in_place actually does do basically a spawn_blocking call to obtain a thread for a new worker, so this is interesting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-tokio Area: The main tokio crate C-enhancement Category: A PR with an enhancement or bugfix. M-blocking Module: tokio/task/blocking T-performance Topic: performance and benchmarks
Projects
None yet
Development

No branches or pull requests

7 participants