-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
A thread-safe rand wrapper that relies on a Threaded Array #27950
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
stdlib/Random/src/RNGs.jl
Outdated
@@ -582,3 +582,13 @@ function _randjump(mt::MersenneTwister, jumppoly::DSFMT.GF2X, len::Integer) | |||
end | |||
return mts | |||
end | |||
|
|||
const TRNG = Array{MersenneTwister} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You probably want Vector
or Array{MersenneTwister, 1}
here.
stdlib/Random/src/RNGs.jl
Outdated
|
||
TRNG(r::MersenneTwister=GLOBAL_RNG) = TRNG(randjump(r, big(10)^20, Threads.nthreads())) | ||
|
||
function trand(r::Array{MersenneTwister}=ThreadRNG[]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
stdlib/Random/src/RNGs.jl
Outdated
const TRNG = Array{MersenneTwister} | ||
const ThreadRNG = Ref{TRNG}() | ||
|
||
TRNG(r::MersenneTwister=GLOBAL_RNG) = TRNG(randjump(r, big(10)^20, Threads.nthreads())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It confused me a bit that you wrote TRNG(...)
since randjump
returns Vector{MersenneTwister}
. It might be good to remove the typealias (e.g. const TRNG = ...
)
After the changes the benchmarks are restored :
The test added is just to check up #10441. |
stdlib/Random/src/Random.jl
Outdated
@@ -248,6 +249,7 @@ function __init__() | |||
Base.showerror_nostdio(ex, | |||
"WARNING: Error during initialization of module Random") | |||
end | |||
ThreadRNG[] = randjump(GLOBAL_RNG, big(10)^20, Threads.nthreads()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this must be done on multiple threads instead of only one one thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean implementing something like resize_nthreads!()
but that assigns to each buffer a different generator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No I mean the engine has to be allocated on different threads. See the recent discourse thread about this. (On my phone, hard to paste link)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I had this discussion with @stevengj about placing TLS generators.
The problem is that
- Using
task_local_storage
is impractical as they often get dereferenced. - Writing a interface to
pthread_setspecifc()
can be useless, as retrieving variables there can be slower than accessing an array. Also it would bring the problem of managing the keys at runtime.
Where also it would not probably benefit us for performance, as we have an overhead ~1ns with respect to rand
, and not in entropy generation as the independence between streams is guaranteed by randjump
.
The only remaining problem is false sharing which I will test today with perf.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No it has NOTHING to do with task_local_storage or pthread tls API. You just need to "allocated on different threads", i.e. let the thread allocate, that's it.
Where also it would not probably benefit us for performance
Oh, it'll absolutely kill you. Again, see the benchmarks in the discourse thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ehm the discussion above was started by myself, and I did those benchmark (sorry I'm Gaetano :D ). If you refer to another one, please link it to me.
The problem is that the definition of trand()
(that present in the benchmarks linked above) is done at runtime, and the ThreadRNG const array was also instantiated at runtime.
That's why there I had the opportunity to let a thread allocate the array of MersenneTwister
.
Instead here, at stdlib, I have to pass through a reference to instantiate a global object at init.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ehm the discussion above was started by myself
I never mention "discussion above", I said "discourse thread" which I didn't link since I was on my phone as explained above.
I don't think that thread is that hard to find. It's currently the second result when searching for "rand" and "thread" (sorted with date). https://discourse.julialang.org/search?q=rand%20thread%20order%3Alatest, the direct link is https://discourse.julialang.org/t/poor-performance-on-cluster-multithreading
ThreadRNG const array was also instantiated at runtime.
And speaking of that, I don't really understand why it has to be a ref of vector. Vector is mutable already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I thought about a previous thread.
OK I'm going to write an helper function to do that. Thanks for your support!
stdlib/Random/src/RNGs.jl
Outdated
@@ -582,3 +582,19 @@ function _randjump(mt::MersenneTwister, jumppoly::DSFMT.GF2X, len::Integer) | |||
end | |||
return mts | |||
end | |||
|
|||
const ThreadRNG = Vector{MersenneTwister}(undef, Threads.nthreads()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resize in __init__
Hi @yuyichao ! Thanks again for you comments! I made the resize call at |
…t into Random. Implemented set_nthreads!() to allocate ThreadRNG at init.
Is this closed by #32407? |
This is an attempt to implement an additional utility in
Random
that generate random number safely in multi thread code. ( #10441 )I choose to keep it as an external utility to avoid interfering in
rand()
performance for mono-thread programs. The independence of generators is guaranteed byrandjump
.Something strange that I noticed is that testing this solution on commit
fe4337613f0da442cf01c553c0a0a8b9dc8b66a2
(~25 days old master) gives on thetrand()
benchmarksInstead on the more recent commit
80db579cf938a75cbb1edf0dbe90fc27d85210ac
(~2 days master)I do not know what can justify such a degradation in performance.
EDIT : performance restored after 2nd commit.