Skip to content

keccak: AVX2/AVX512 optimizations #108

@tarcieri

Description

@tarcieri

I was looking through XKCP (eXtended Keccak Code Package, https://github.com/XKCP/XKCP) at what optimized implementations they have available.

I noticed they did have intrinsics-based implementations available for AVX2, but they compute e.g. Keccak-p1600 with 2, 4, or 8-way parallelism:

https://github.com/XKCP/XKCP/blob/716f007dd73ef28d357b8162173646be574ad1b7/lib/low/KeccakP-1600-times4/AVX2/KeccakP-1600-times4-AVX2.c

There also appears to be a non-parallel intrinsics implementation for AVX-512:

https://github.com/XKCP/XKCP/blob/716f007dd73ef28d357b8162173646be574ad1b7/lib/low/KeccakP-1600/AVX512/C/KeccakP-1600-AVX512.c

However, the non-parallel implementation for AVX2 is ASM-only:

https://github.com/XKCP/XKCP/blob/716f007dd73ef28d357b8162173646be574ad1b7/lib/low/KeccakP-1600/AVX2/KeccakP-1600-AVX2.s

See also the ARMv8 FEAT_SHA3 extensions: #93.

I'm not sure if this is because an intrinsics-based implementation doesn't make sense due to the need for a precisely designed register schedule, or because someone hasn't done the work yet to implement it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions