Skip to content

Zeroize performance on u8 arrays #743

Open
@blckngm

Description

@blckngm

I inspected the generated assembly code and benchmarked zeroize for [u8; 32] on x86_64 and found it quite inefficient, storing one byte at a time:

https://play.rust-lang.org/?version=stable&mode=release&edition=2021&gist=3f44f4b90e6af0eac0dcb1f649390329

On my Ryzen CPU, it takes ~7.8324 ns, or ~1cpb. Binary code size is also quite large.

Using inline assembly (just stabilized in 1.59) and SSE2, zeroing a [u8; 32] takes just 3 instructions and ~492.87 ps (~16 bytes per cycle):

let mut buf: [u8; 32];
core::arch::asm!(
    "xorps {zero}, {zero}",
    "movups {zero}, ({ptr})",
    "movups {zero}, 16({ptr})",
    zero = out(xmm_reg) _,
    ptr = in(reg) &mut buf,
    options(att_syntax, nostack, preserves_flags),
);

So it might be something worth optimizing/documenting.

If you do not want to use inline assembly, maybe you should encourage using larger types or SIMD types, e.g., [u64; 4] or [__m128; 2] instead of [u8; 32]. Using write_volatile on *mut __m128 generates equally compact and efficient code as the assembly code above.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions