-
Notifications
You must be signed in to change notification settings - Fork 501
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using rayon under a Mutex can lead to deadlocks #592
Comments
Solutions that come to my mind:
|
Mostly, my expectation is that the answer here would be "don't do that", but it is sort of disappointing, since we would like you to be able to add That doesn't realistically work around blocking things though, and mutexes fit that model of course. |
I think that using a (Of course, you may not have control of all the relevant code.) |
Why do you call this global? It should install in whatever threadpool you call it on. As long as that's distinct from the threadpool that grabbed the mutex, and there's nothing in it that calls back to the mutex's threadpool, I think it would work. |
I think this would only help if they had separate stacks, otherwise your |
Sorry, I misread it as |
Oh, actually, I think |
I've ran into this problem again (or rather, I've never fixed it and was lucky not to get a deadlock in the meantime) I've tried: with_mutex(|| {
rayon::ThreadPoolBuilder::new()
.build()
.unwrap()
.install(|| {
parallel_processing()
})
}) but as you've said, it didn't work around the issue. All threads from the global pool are waiting for the mutex (which is entirely expected in my case), but then all the threads from the second pool end up locked at:
|
Is there any hope of this being fixed in Rayon? I don't know if I should wait for a fix, or remove Rayon from the mutex-using part of the code. |
I don't have a sense of what the fix should be, nevermind giving any ETA on it. |
The fixes I suppose could solve it:
|
That use would still be a deadlock-loop if the lock holder -- or anything the lock holder is waiting for -- is blocked by you on the call stack. They'll never complete until you return to them.
I think this is feasible. |
If you're really only reading the mutexed data in the thread pool, you could also consider a |
If this isn't going to be fixed, I would suggest documenting it as a known issue and adding a warning to I just spent nearly two hours debugging a dead-lock before finding this issue 😢 |
I was able to get something like this working by adding a feature for use std::sync::{Arc, Mutex};
use rayon::ThreadPoolBuilder;
use rayon::iter::IntoParallelRefIterator;
use rayon::iter::ParallelIterator;
fn mutex_and_par(mutex: Arc<Mutex<Vec<i32>>>, blocking_pool: &rayon::ThreadPool) {
// Lock the mutex and collect items using the full blocking thread pool
let vec = mutex.lock().unwrap();
let result: Vec<i32> = blocking_pool.install(|| vec.par_iter().cloned().collect());
println!("{:?}", result);
}
#[test]
fn test_issue592() {
let collection = vec![1, 2, 3, 4, 5];
let mutex = Arc::new(Mutex::new(collection));
let blocking_pool = ThreadPoolBuilder::new().full_blocking().num_threads(4).build().unwrap();
let dummy_collection: Vec<i32> = (1..=100).collect();
dummy_collection.par_iter().for_each(|_| {
mutex_and_par(mutex.clone(), &blocking_pool);
});
} See for more details: |
https://users.rust-lang.org/t/using-rayon-in-code-locked-by-a-mutex/20119
@kornelski presented the following pseudo-code representing a possible deadlock:
I think the way this fails is something like:
for_each(mutex_and_par)
calls.par_iter()
into a few jobscollect()
on one of those jobsfor_each(mutex_and_par)
callsIf serialized, this code would never be a problem, but a work-stealing threadpool can cause it to implicitly recurse the lock. I'm not sure what we can do about this, or if there's any better advice we can give than just "don't call rayon under a mutex"...
The text was updated successfully, but these errors were encountered: