Skip to content

A thread-safe rand wrapper that relies on a Threaded Array #27950

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

fisiognomico
Copy link
Contributor

@fisiognomico fisiognomico commented Jul 5, 2018

This is an attempt to implement an additional utility in Random that generate random number safely in multi thread code. ( #10441 )
I choose to keep it as an external utility to avoid interfering in rand() performance for mono-thread programs. The independence of generators is guaranteed by randjump.
Something strange that I noticed is that testing this solution on commit fe4337613f0da442cf01c553c0a0a8b9dc8b66a2 (~25 days old master) gives on the trand() benchmarks

julia>@benchmark trand()
    memory estimate:  0 bytes
    allocs estimate:  0
    --------------
    minimum time:     5.395 ns (0.00% GC)
    median time:      6.527 ns (0.00% GC)
    mean time:        6.661 ns (0.00% GC)
    maximum time:     29.691 ns (0.00% GC)
    --------------
    samples:          10000
    evals/sample:     1000

Instead on the more recent commit 80db579cf938a75cbb1edf0dbe90fc27d85210ac (~2 days master)

julia>@benchmark trand()
    memory estimate:  16 bytes
    allocs estimate:  1
    --------------
    minimum time:     22.683 ns (0.00% GC)
    median time:      23.654 ns (0.00% GC)
    mean time:        31.057 ns (17.71% GC)
    maximum time:     38.199 μs (99.92% GC)
    --------------
    samples:          10000
    evals/sample:     996

I do not know what can justify such a degradation in performance.

EDIT : performance restored after 2nd commit.

@@ -582,3 +582,13 @@ function _randjump(mt::MersenneTwister, jumppoly::DSFMT.GF2X, len::Integer)
end
return mts
end

const TRNG = Array{MersenneTwister}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably want Vector or Array{MersenneTwister, 1} here.


TRNG(r::MersenneTwister=GLOBAL_RNG) = TRNG(randjump(r, big(10)^20, Threads.nthreads()))

function trand(r::Array{MersenneTwister}=ThreadRNG[])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

const TRNG = Array{MersenneTwister}
const ThreadRNG = Ref{TRNG}()

TRNG(r::MersenneTwister=GLOBAL_RNG) = TRNG(randjump(r, big(10)^20, Threads.nthreads()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It confused me a bit that you wrote TRNG(...) since randjump returns Vector{MersenneTwister}. It might be good to remove the typealias (e.g. const TRNG = ...)

@fisiognomico
Copy link
Contributor Author

fisiognomico commented Jul 8, 2018

After the changes the benchmarks are restored :

julia> versioninfo()
Julia Version 0.7.0-beta.118
Commit 8a6d62d6b4* (2018-07-05 21:52 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: Intel(R) Core(TM) i5-3320M CPU @ 2.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.0 (ORCJIT, ivybridge)

julia> @benchmark trand()
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     5.534 ns (0.00% GC)
  median time:      6.363 ns (0.00% GC)
  mean time:        6.453 ns (0.00% GC)
  maximum time:     35.026 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

The test added is just to check up #10441.

@@ -248,6 +249,7 @@ function __init__()
Base.showerror_nostdio(ex,
"WARNING: Error during initialization of module Random")
end
ThreadRNG[] = randjump(GLOBAL_RNG, big(10)^20, Threads.nthreads())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this must be done on multiple threads instead of only one one thread.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean implementing something like resize_nthreads!() but that assigns to each buffer a different generator?

Copy link
Contributor

@yuyichao yuyichao Jul 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I mean the engine has to be allocated on different threads. See the recent discourse thread about this. (On my phone, hard to paste link)

Copy link
Contributor Author

@fisiognomico fisiognomico Jul 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I had this discussion with @stevengj about placing TLS generators.
The problem is that

  1. Using task_local_storage is impractical as they often get dereferenced.
  2. Writing a interface to pthread_setspecifc() can be useless, as retrieving variables there can be slower than accessing an array. Also it would bring the problem of managing the keys at runtime.

Where also it would not probably benefit us for performance, as we have an overhead ~1ns with respect to rand, and not in entropy generation as the independence between streams is guaranteed by randjump.
The only remaining problem is false sharing which I will test today with perf.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it has NOTHING to do with task_local_storage or pthread tls API. You just need to "allocated on different threads", i.e. let the thread allocate, that's it.

Where also it would not probably benefit us for performance

Oh, it'll absolutely kill you. Again, see the benchmarks in the discourse thread.

Copy link
Contributor Author

@fisiognomico fisiognomico Jul 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ehm the discussion above was started by myself, and I did those benchmark (sorry I'm Gaetano :D ). If you refer to another one, please link it to me.

The problem is that the definition of trand() (that present in the benchmarks linked above) is done at runtime, and the ThreadRNG const array was also instantiated at runtime.
That's why there I had the opportunity to let a thread allocate the array of MersenneTwister.
Instead here, at stdlib, I have to pass through a reference to instantiate a global object at init.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ehm the discussion above was started by myself

I never mention "discussion above", I said "discourse thread" which I didn't link since I was on my phone as explained above.
I don't think that thread is that hard to find. It's currently the second result when searching for "rand" and "thread" (sorted with date). https://discourse.julialang.org/search?q=rand%20thread%20order%3Alatest, the direct link is https://discourse.julialang.org/t/poor-performance-on-cluster-multithreading

ThreadRNG const array was also instantiated at runtime.

And speaking of that, I don't really understand why it has to be a ref of vector. Vector is mutable already.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I thought about a previous thread.
OK I'm going to write an helper function to do that. Thanks for your support!

@kshyatt kshyatt added the multithreading Base.Threads and related functionality label Jul 10, 2018
@@ -582,3 +582,19 @@ function _randjump(mt::MersenneTwister, jumppoly::DSFMT.GF2X, len::Integer)
end
return mts
end

const ThreadRNG = Vector{MersenneTwister}(undef, Threads.nthreads())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resize in __init__

@fisiognomico
Copy link
Contributor Author

Hi @yuyichao ! Thanks again for you comments! I made the resize call at set_nthreads!() to mimic the behavior of resize_nthreads!(). Just to appoint some changes :
After rebase randjump(MT, steps, len) has been dismissed, so I had to change a bit the logic of generators instantiation. We still get independent streams (at least it seems from the naive test added), and the performance seems at the usual level.
The rebase did not went very well, so I have to squash the commits in about 2.
Anyway I had to pull out of Future the randjump(::MT, steps::Int), let me know if is not too early!

…t into Random.

Implemented set_nthreads!() to allocate ThreadRNG at init.
@rfourquet rfourquet added the randomness Random number generation and the Random stdlib label Aug 27, 2018
@stevengj
Copy link
Member

Is this closed by #32407?

@vtjnash vtjnash closed this Mar 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
multithreading Base.Threads and related functionality randomness Random number generation and the Random stdlib
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants