Skip to content

Replacing rand_chacha with chacha20 #934

@dhardy

Description

@dhardy

@tarcieri has recently implemented ChaCha*Rng via the chacha20 crate. docs are here. Breaking out into a new issue this isn't really the topic of #872...

This brings the question: should we prefer this over the current (@kazcw's) implementation?

First things first, the rand_chacha crate has 44 reverse dependencies, is clearly a rand family crate and is recommended in our book. There is no plan to retire this crate.

Second, lets look at a few stats.

Unsafe

Here are the results of running cargo geiger on the current rand_core (after #931):

$ cargo geiger
<snip>
Metric output format: x/y
    x = unsafe code used by the build
    y = total unsafe code found in the crate

Symbols: 
    :) = No `unsafe` usage found, declares #![forbid(unsafe_code)]
    ?  = No `unsafe` usage found, missing #![forbid(unsafe_code)]
    !  = `unsafe` usage found

Functions  Expressions  Impls  Traits  Methods  Dependency

0/0        0/0          0/0    0/0     0/0      ?  rand_chacha 0.2.2
2/2        557/626      0/0    0/0     14/22    !  ├── ppv-lite86 0.2.6
0/0        22/22        0/0    0/0     0/0      !  └── rand_core 0.5.1

2/2        579/648      0/0    0/0     14/22  

And on chacha20:

$ cargo geiger --no-default-features --features rng
<snip>
Functions  Expressions  Impls  Traits  Methods  Dependency

7/14       199/427      0/0    0/0     1/2      !  chacha20 0.3.1
0/0        22/22        0/0    0/0     0/0      !  ├── rand_core 0.5.1
2/4        50/150       1/1    0/0     3/3      !  │   └── getrandom 0.1.14
0/0        0/0          0/0    0/0     0/0      ?  │       ├── cfg-if 0.1.10
2/2        73/95        0/0    0/0     5/11     !  │       └── libc 0.2.66
0/0        0/0          0/0    0/0     0/0      ?  └── stream-cipher 0.3.2
0/0        0/0          0/0    0/0     0/0      ?      ├── blobby 0.1.2
0/1        0/231        0/0    0/0     0/0      ?      │   └── byteorder 1.3.2
0/1        0/223        0/18   0/8     0/5      ?      └── generic-array 0.12.3
0/0        0/51         0/0    0/0     0/0      ?          └── typenum 1.11.2

11/22      344/1199     1/19   0/8     9/21   

Something's wrong here: stream-chiper is only a dev-dependency and rand_core is depended on in exactly the same way as in rand_chacha. So only the first line is relevant.

Lines of code

Tokei output for rand_chacha:

$ tokei rand/rand_chacha/ cryptocorrosion/utils-simd/ppv-lite86/
-------------------------------------------------------------------------------
 Language            Files        Lines         Code     Comments       Blanks
-------------------------------------------------------------------------------
 Markdown                2           70           70            0            0
 Rust                    9         4285         3864          164          257
 TOML                    2           48           42            0            6
-------------------------------------------------------------------------------
 Total                  13         4403         3976          164          263
-------------------------------------------------------------------------------

(83% of this is from ppv-lite86).

For chacha20:

$ tokei
-------------------------------------------------------------------------------
 Language            Files        Lines         Code     Comments       Blanks
-------------------------------------------------------------------------------
 Markdown                2          170          170            0            0
 Rust                   12         1617         1163          203          251
 TOML                    1           47           40            0            7
-------------------------------------------------------------------------------
 Total                  15         1834         1373          203          258
-------------------------------------------------------------------------------

There are no dependencies we need, so that's it. A nice improvement.

Benchmarks

(64-bit Haswell)

Here's rand_chacha:

$ cargo bench --bench generators chacha
   Compiling ...

running 16 tests
test gen_bytes_chacha12      ... bench:     356,133 ns/iter (+/- 5,261) = 2875 MB/s
test gen_bytes_chacha20      ... bench:     539,102 ns/iter (+/- 8,237) = 1899 MB/s
test gen_bytes_chacha8       ... bench:     263,023 ns/iter (+/- 15,029) = 3893 MB/s
test gen_u32_chacha12        ... bench:       1,689 ns/iter (+/- 168) = 2368 MB/s
test gen_u32_chacha20        ... bench:       2,475 ns/iter (+/- 60) = 1616 MB/s
test gen_u32_chacha8         ... bench:       1,300 ns/iter (+/- 56) = 3076 MB/s
test gen_u64_chacha12        ... bench:       4,058 ns/iter (+/- 324) = 1971 MB/s
test gen_u64_chacha20        ... bench:       4,472 ns/iter (+/- 647) = 1788 MB/s
test gen_u64_chacha8         ... bench:       3,252 ns/iter (+/- 421) = 2460 MB/s
test init_chacha             ... bench:          30 ns/iter (+/- 7)

and here's chacha20:

running 16 tests
test gen_bytes_chacha12      ... bench:   1,184,161 ns/iter (+/- 75,680) = 864 MB/s
test gen_bytes_chacha20      ... bench:   1,894,244 ns/iter (+/- 16,309) = 540 MB/s
test gen_bytes_chacha8       ... bench:     824,644 ns/iter (+/- 52,944) = 1241 MB/s
test gen_u32_chacha12        ... bench:       4,607 ns/iter (+/- 97) = 868 MB/s
test gen_u32_chacha20        ... bench:       7,351 ns/iter (+/- 121) = 544 MB/s
test gen_u32_chacha8         ... bench:       3,257 ns/iter (+/- 407) = 1228 MB/s
test gen_u64_chacha12        ... bench:       9,396 ns/iter (+/- 1,104) = 851 MB/s
test gen_u64_chacha20        ... bench:      14,856 ns/iter (+/- 750) = 538 MB/s
test gen_u64_chacha8         ... bench:       6,431 ns/iter (+/- 625) = 1243 MB/s
test init_chacha             ... bench:          14 ns/iter (+/- 0)

Hmm, looks like chacha20 needs some help:

$ export RUSTFLAGS="-C target-cpu=native"
$ cargo bench --bench generators chacha
...
test gen_bytes_chacha12      ... bench:     500,699 ns/iter (+/- 28,157) = 2045 MB/s
test gen_bytes_chacha20      ... bench:     789,107 ns/iter (+/- 52,585) = 1297 MB/s
test gen_bytes_chacha8       ... bench:     357,024 ns/iter (+/- 23,501) = 2868 MB/s
test gen_u32_chacha12        ... bench:       2,139 ns/iter (+/- 56) = 1870 MB/s
test gen_u32_chacha20        ... bench:       3,260 ns/iter (+/- 116) = 1226 MB/s
test gen_u32_chacha8         ... bench:       1,571 ns/iter (+/- 32) = 2546 MB/s
test gen_u64_chacha12        ... bench:       4,472 ns/iter (+/- 55) = 1788 MB/s
test gen_u64_chacha20        ... bench:       6,714 ns/iter (+/- 421) = 1191 MB/s
test gen_u64_chacha8         ... bench:       3,338 ns/iter (+/- 46) = 2396 MB/s

Closer, but still behind rand_chacha (which gets negligible boost from target-cpu=native thanks to auto-detection).

Running chacha20's built-in benchmarks, I get around 6-6.2 cycles/byte without target-cpu=native, and 2.5-2.7 with; this is significantly short of the 1.4 cycles/byte @tarcieri claims so something gives here (perhaps just CPU-specific optimisations).


Of course, there's more to this than a few stats, and number-of-unsafe-usages is not a particularly useful comparison (since it says nothing about the size of the unsafe blocks). This is all I have time for right now. Thanks to all authors (also significantly @newpavlov).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions