A fast, lightweight normal distribution random number generator(RNG) based on Box-Muller method, with SSE and AVX2 accelerated versions.
We implemented a simple LCG at the bottom to directly generate uniform distributed floats.
- The normal distribution RNG passed the Kolmogorov-Smirnov test with α=0.01. During the test, the value of
max{abs(Fobs(xi)-Fexp(xi))}
was almost the same as that of numpy. - For performance, the bare
Floats
is a bit slower than numpy, i.e. about 1.45 times slower. But as we useFloatsSSE
, the speed will boost to about 3 times higher than numpy. If we continue withFloatsAVX2
, it can be 7x faster than numpy! Howerer, as numpy generatesnp.float64
, our comparison is quite unfair. *** So this RNG should ONLY be used on occasions where you just want speed and don't need doubles. ***
void CreateGenerator(float mu, float sigma_square);
float NextFloat();
float* Floats(unsigned int count);
float* FloatsSSE(unsigned int count);
float* FloatsAVX2(unsigned int count);
After loading the DLL library, call
CreateGenerator(float mu, float sigma_square)
to create a generator.
Then, call the following functions based on your need:
NextFloat()
: Get next normal distributed float.Nan
will be returned if you have never calledCreateGenerator
before.Floats(unsigned int count)
: Generatecount
normal distributed floats in an array, and return a pointer to the first element.nullptr
will be returned if you have never calledCreateGenerator
before, which also applies to the following two functions.FloatsSSE(unsigned int count)
: SSE acceleratedFloats
(~4x faster). *** NOTE: SSE, SSE2, SSE4.1 intrinsics MUST be supported by your CPU, or your program will crash. ***FloatsAVX2(unsigned int count)
: AVX+AVX2 acceleratedFloats
(~8x faster). *** NOTE: AVX and AVX2 intrinsics MUST be supported by your CPU, or your program will crash. ***
- Fobs and Fexp figure in K-S test (μ=-3, σ²=9, count=100000):
- Random number distribution histogram (v.s. numpy, μ=10, σ²=20, count=100000):
- Performance benchmark (count=10000000, i7-8565U):