-
-
Notifications
You must be signed in to change notification settings - Fork 32
Open
Description
There is some discussion on how to vectorize xoshiro256++ at JuliaLang/julia#27614. The method relies on interleaving 4 xoshiro256++ generators. I implemented it and the results are impressive (see gen_bytes_fill
):
test gen_bytes_chacha12 ... bench: 324,494 ns/iter (+/- 4,236) = 3155 MB/s
test gen_bytes_chacha20 ... bench: 490,442 ns/iter (+/- 16,214) = 2087 MB/s
test gen_bytes_chacha8 ... bench: 243,010 ns/iter (+/- 19,972) = 4213 MB/s
test gen_bytes_fill ... bench: 105,350 ns/iter (+/- 1,456) = 9719 MB/s
test gen_bytes_pcg64mcg ... bench: 321,665 ns/iter (+/- 7,854) = 3183 MB/s
test gen_bytes_splitmix64 ... bench: 233,973 ns/iter (+/- 1,859) = 4376 MB/s
test gen_bytes_xoshiro256plusplus ... bench: 343,911 ns/iter (+/- 6,580) = 2977 MB/s
The implementation is 3.3 time faster than the non-vectorized xoshiro256++ generator and more than 2.2 times faster than splitmix64 or chacha8. It is also faster than dSFMT. However, the size of the state is blown up to 128 bytes, which is almost as large as chacha's state (136 bytes).
Metadata
Metadata
Assignees
Labels
No labels