Argon is a library that provides a sane, minimal-overhead C++ interface to ARM's NEON SIMD intrinsics.
Argon grew out of a desire to use C++ abstractions for vector intrinsics, without sacrificing expressiveness or specificity. Most C++ vector libraries (highway, xsimd) abstract away all platform uniqueness in favor of common denominators. When considering the use case of those libraries, namely HPC or multi-platform portability, that's a strength. Unfortunately, when you're targeting a very specific subset of those architectures, or you have one architecture that needs to be optimized above all others, such abstractions fall short.
This is where Argon was created to exist: A NEON-first, others-second zero-overhead abstraction library with modern constructs and capabilities.
- A low-level intrinsics abstraction layer with sane naming conventions and overloaded functions
- Higher-level encapsulated vector object types built on that layer
- A library of optimized commonly used algorithms
Documentation on usage and internal details (generated by Doxygen) can be found here
#include <argon/argon.hpp>
#include <iostream>
Argon<float> triangle(Argon<float> phase) {
phase = argon::ternary(phase > 0.5f, 1.f, 0.f) - phase;
return Argon{-1.f}.MultiplyAdd(phase.Absolute(), 4.f);
}
int main() {
float frequency = 440.f;
float phase_increment = frequency / 44100.f;
Argon<float> phase = Argon{0}.MultiplyAdd(phase_increment, Argon<float>::Iota(0))
Argon<float> wave = triangle(phase);
// Print the result
for (size_t i = 0; i < result.size(); ++i) {
std::cout << "Result[" << i << "]: " << result[i] << std::endl;
}
}#include <argon/arm_simd.hpp>
#include <iostream>
int main() {
// Example: Multiply-add two vectors using NEON
float32x4_t a = {1.0f, 2.0f, 3.0f, 4.0f};
float32x4_t b = {5.0f, 6.0f, 7.0f, 8.0f};
float32x4_t c = {9.0f, 10.0f, 11.0f, 12.0f};
// Perform vector multiply-add
float32x4_t result = neon::multiply_add(a, b, c);
// Print the result
for (size_t i = 0; i < result.size(); ++i) {
std::cout << "Result[" << i << "]: " << result[i] << std::endl;
}
return 0;
}| NEON | MVE | |
|---|---|---|
| Number of quad-word registers | 16 | 8 |
| Narrowing support (see note) | to lower double-word register | to even or odd scalar registers |
| Double-word instructions | ✔️ | ❌ |
| 64-bit floating point | ✔️ | ❌ |
| Predicated lane instructions | ❌ | ✔️ |
| Native Scatter-Gather | ❌ | ✔️ |
- NEON supports double-word and quad-word operations, MVE only has quad-word.
- This means that operations that would result in a double-word (such as narrowing) instead go into either the top or bottom registers of the current quadword.
- i.e. in NEON, an int32x4_t narrow to int16x4_t {x, x, x, x} => {x, x, x, x, o, o, o, o}
- while in MVE they are placed in the bottom or top of a pair: {x, x, x, x} => {x, o, x, o, x, o, x, o}
| Backend | Architectures | Status | Notes |
|---|---|---|---|
| ARM NEON (ARMv7) | VFPv3, VFPv3-FP16, VFPv4 | ✅ | Primary target |
| ARM NEON (ARMv8+) | AArch32, AArch64 | ✅ | Primary target |
| ARM MVE (Helium) | ARMv8.1-M | Secondary target (in-progress) | |
| SIMDe | x86-64(SSE2/AVX), RISCV | ✅ | Tertiary target, used for portability and testing |
Argon can be compiled using the following tool sets:
Compilers:
- GCC 14.2 or later
- LLVM Embedded Toolchain for ARM 19 or later (ARMv7-A)
- LLVM/Clang 20.1 or later (AArch32/AArch64)
- MSVC 19.44 or later
Testing is currently done across a range of platforms and compilers, including:
| Compiler | ARMv7 | ARMv8 | ARMv8.1-M | X86-64 |
|---|---|---|---|---|
| GCC | ✔️ | ✔️ | ✔️ | ✔️ |
| Clang | ❌ | ✔️ | (TBD) | ✔️ |
| MSVC | ❌ | (TBD) | ❌ | ✔️ |
| Compiler | Bare-metal | Linux | macOS | Windows |
|---|---|---|---|---|
| GCC | ✔️ | ✔️ | ✔️ | |
| Clang | ✔️ | ✔️ | ||
| MSVC | ❌ | ❌ | ❌ | ✔️ |
*: Windows/GCC (via MinGW64) and Bare-metal/Clang (via the LLVM Embedded Toolchain for ARM) are used regularly but not tested via CI.
**: In order to compile with Clang on macOS, you'll need to use the brew-bundled versions of libc++, as Apple's system libraries do not support required C++23 features.
| Host | ARMv7 | ARMv8 | X86-64 |
|---|---|---|---|
| Linux | ✔️ | ✔️ | |
| macOS | ❌ | ✔️ | ✔️ |
| Windows | ❌ | (TBD) | ✔️ |