This is a C++ implementation of the lemac cryptographic hash by Augustin Bariant.
Lemac is very interesting because it utilizes AES hardware acceleration which is very common.
The AESNI implementation is based on the public domain code in https://github.com/AugustinBariant/Implementations_LeMac_PetitMac/ but has been heavily modified to use C++ and run clean with sanitizers.
The bulk speed on a single core is up to about
- 77 GiB/s, 0.067 cycles per byte (AMD zen 4, Debian 13 Trixie)
- 48 GiB/s (Apple M4 MacBook Air, macos)
- 11 GiB/s, 0.19 cycles per byte (raspberry pi 5, Ubuntu 24.04)
The speed is measured with the included benchmark and using perf stat
to get the average clock frequency during the test. To build and run the benchmark:
cmake -B build-release -S . -DCMAKE_BUILD_TYPE=Release
cmake --build build-release
build-release/benchmark/benchmark
The cost of the hash is divided into
- Initialization (cost is roughly equivalent to hashing 5 kB of extra data)
- Hashing
- Finalization (cost is roughly equivalent to hashing 2 kB of extra data)
The initialization cost can mostly be avoided by either reusing an existing hasher object (using .reset()
followed by .update()
and .finalize()
) or simply instantiate one object and copy it before each hash operation.
If all data to be hashed is known up front, the oneshot()
function is more efficient to use than update()
followed by finalize()
.
Measurements on an AMD Ryzen 9 7950X3D:
- lemac 256 kB, oneshot 76.7 GiB/s (median of 5)
- lemac 1 MB, oneshot 74.3 GiB/s (median of 5)
- lemac 256 kB, update+finalize 76.6 GiB/s (median of 5)
- lemac 1 MB, update+finalize 74.3 GiB/s (median of 5)
This can be compared with some other interesting hashes
- the non-cryptographic xxh3/xxh128 hashes reaches about 89 GiB/s, measured with
xxhsum -b
. This is about 16% faster than lemac. - the non-cryptographic clhash which reaches 0.11 cycles per byte using the provided benchmark, which is about 39% slower than lemac.
- SHA1 measured with
openssl speed -evp SHA1
reaches about 2.8 GiB/s
The AES performance, as measured by openssl speed -evp AES-128-ECB
is about 16.4 GiB/s.
Here are the results from a single run with default power settings plugged in to the charger, if that matters. There is some run to run variation, take this with a grain of salt.
compiler: clang
with 1 byte at a time and strategy update_and_finalize: hashed with 0.038 GiB/s 0.027 µs/hash
with 1024 byte at a time and strategy update_and_finalize: hashed with 23.408 GiB/s 0.044 µs/hash
with 16384 byte at a time and strategy update_and_finalize: hashed with 45.351 GiB/s 0.361 µs/hash
with 262144 byte at a time and strategy update_and_finalize: hashed with 47.915 GiB/s 5.471 µs/hash
with 1048576 byte at a time and strategy update_and_finalize: hashed with 47.990 GiB/s 21.850 µs/hash
with 1 byte at a time and strategy oneshot: hashed with 0.021 GiB/s 0.049 µs/hash
with 1024 byte at a time and strategy oneshot: hashed with 17.128 GiB/s 0.060 µs/hash
with 16384 byte at a time and strategy oneshot: hashed with 43.223 GiB/s 0.379 µs/hash
with 262144 byte at a time and strategy oneshot: hashed with 47.394 GiB/s 5.531 µs/hash
with 1048576 byte at a time and strategy oneshot: hashed with 47.653 GiB/s 22.005 µs/hash
Here are the results from a single run with default power settings plugged in to the charger, if that matters. There is some run to run variation, take this with a grain of salt.
compiler: clang
with 1 byte at a time and strategy update_and_finalize : hashed with 0.042 GiB/s 0.024 µs/hash
with 1024 byte at a time and strategy update_and_finalize : hashed with 24.588 GiB/s 0.042 µs/hash
with 16384 byte at a time and strategy update_and_finalize : hashed with 42.378 GiB/s 0.387 µs/hash
with 262144 byte at a time and strategy update_and_finalize : hashed with 43.691 GiB/s 6.000 µs/hash
with 1048576 byte at a time and strategy update_and_finalize : hashed with 43.854 GiB/s 23.911 µs/hash
with 1 byte at a time and strategy oneshot : hashed with 0.021 GiB/s 0.049 µs/hash
with 1024 byte at a time and strategy oneshot : hashed with 16.297 GiB/s 0.063 µs/hash
with 16384 byte at a time and strategy oneshot : hashed with 39.348 GiB/s 0.416 µs/hash
with 262144 byte at a time and strategy oneshot : hashed with 43.242 GiB/s 6.062 µs/hash
with 1048576 byte at a time and strategy oneshot : hashed with 43.577 GiB/s 24.062 µs/hash
This was measured on Ubuntu 24.04. Unfortunately, this was measured with insufficient cooling and the average clock speed was 2.263 GHz according to perf stat.
- lemac 256 kB, oneshot 11.4 GiB/s (median of 7)
- lemac 1 MB, oneshot 11.5 GiB/s (median of 7)
- lemac 256 kB, update+finalize 11.0 GiB/s (median of 7)
- lemac 1 MB, update+finalize 10.6 GiB/s (median of 7)
The non-cryptographic xxh reaches about 13.2 GiB/s on the same system.
The AES performance, as measured by openssl speed -evp AES-128-ECB
is about 3.4 GiB/s.
The code requires C++20. amd64 (AES-NI) is supported (most current desktop cpus), as well as arm64 with cryptographic extensions (like the current apple hardware and raspberry pi 5). Support for other architectures is planned by providing a non-accelerated fallback implementation.
The code runs with clang(>=16) and gcc (>=12). It may work with earlier compiler versions, but those are not tested in CI.
The code works on apple with xcode and current hardware (apple silicon). Apple M1, M3 and M4 have been tested successfully. No tests have been made with the older apple x86 platforms (if you have tested this, please report how it went!).
Both amd64 and arm64 are supported. On arm64, there is currently no runtime check for hardware support, it is hard coded to always on.
After compilation, there is a binary lemacsum which works in similar spirit to sha256sum, xxhsum, b3sum etc.
Generate a checksum:
$ lemacsum file |tee checksum
5351314614a6de5f31704434c083e7a9 file
Verify a checksum:
$ lemacsum -c checksum
file: OK
To use this from cmake, add the following snippet to your CMakeLists.txt:
include(FetchContent)
FetchContent_Declare(
lemac
GIT_REPOSITORY https://github.com/pauldreik/lemac.git
GIT_TAG main
)
FetchContent_MakeAvailable(lemac)
target_link_libraries(my_executable PRIVATE lemac)
and in your code, using fmt for the nice printout:
#include <string>
#include <fmt/ranges.h>
#include "lemac.h"
int main()
{
std::string message("hello");
lemac::LeMac lm;
const auto hash = lm.oneshot(
std::span((const std::uint8_t*)message.data(), message.size()));
fmt::println("the hash of {} is {:02x}", message, fmt::join(hash, ""));
// prints "the hash of hello is b2e64fbf9da60940f54a4cd3ee07c37d"
}
Boost 1.0 license, which allows commercial use and modification.
The goals are (in order)
- correctness
- run clean with both undefined and address sanitizers
- performance
- readability and maintainability
- portability