Skip to content

Vectorized xoshiro256++ #3

@vks

Description

@vks

There is some discussion on how to vectorize xoshiro256++ at JuliaLang/julia#27614. The method relies on interleaving 4 xoshiro256++ generators. I implemented it and the results are impressive (see gen_bytes_fill):

test gen_bytes_chacha12           ... bench:     324,494 ns/iter (+/- 4,236) = 3155 MB/s
test gen_bytes_chacha20           ... bench:     490,442 ns/iter (+/- 16,214) = 2087 MB/s
test gen_bytes_chacha8            ... bench:     243,010 ns/iter (+/- 19,972) = 4213 MB/s
test gen_bytes_fill               ... bench:     105,350 ns/iter (+/- 1,456) = 9719 MB/s
test gen_bytes_pcg64mcg           ... bench:     321,665 ns/iter (+/- 7,854) = 3183 MB/s
test gen_bytes_splitmix64         ... bench:     233,973 ns/iter (+/- 1,859) = 4376 MB/s
test gen_bytes_xoshiro256plusplus ... bench:     343,911 ns/iter (+/- 6,580) = 2977 MB/s

The implementation is 3.3 time faster than the non-vectorized xoshiro256++ generator and more than 2.2 times faster than splitmix64 or chacha8. It is also faster than dSFMT. However, the size of the state is blown up to 128 bytes, which is almost as large as chacha's state (136 bytes).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions