libdynemit leverages the ifunc resolver (supported by both GCC and Clang on Linux) to automatically select optimal SIMD implementations at program startup, delivering portable code without sacrificing performance. Thread-safe SIMD detection and dlopen-safe resolver utilities ensure robust operation in multi-threaded applications and dynamic library loading scenarios.
#include <dynemit.h>
// Automatically uses AVX-512, AVX2, AVX, SSE4.2, SSE2 or scalar,
// based on your CPU's capabilities, decided once at program startup
mul_f32(a, b, result, n);
mean_f64(data, n);
entropy_u32(data, n);
Benchmark comparing vector multiplication performance across different CPU architectures using the same build binary. The library automatically detected and utilized each CPU's highest supported SIMD instruction set (AVX-512F, AVX2, AVX or SSE4.2) at runtime. Lower execution time indicates better performance. Each data point represents the median of 10 trials, with error bars showing ±1 standard deviation.
Performance scaling comparison of different SIMD instruction sets on the same CPU (AMD Ryzen 9 9950X3D). This benchmark demonstrates the progressive performance improvements from Scalar → SSE2 → SSE4.2 → AVX → AVX2 → AVX-512F. Each implementation was built and tested separately to isolate the impact of each SIMD level. The chart shows ~1.8x speedup from AVX-512F compared to scalar code for large arrays. Lower execution time indicates better performance. Each data point represents the median of 10 trials, with error bars showing ±1 standard deviation.
Download pre-built packages from GitHub Releases.
Debian/Ubuntu
wget https://github.com/MuriloChianfa/libdynemit/releases/download/v1.2.0/libdynemit_1.2.0_amd64.deb
sudo dpkg -i libdynemit_1.2.0_amd64.debFedora/RHEL
Runtime package:
wget https://github.com/MuriloChianfa/libdynemit/releases/download/v1.2.0/libdynemit-1.2.0-1.fc40.x86_64.rpm
sudo dnf install libdynemit-1.2.0-1.fc40.x86_64.rpmAll packages are cryptographically signed with GPG for authenticity verification.
Import the maintainer's public key:
gpg --keyserver keys.openpgp.org --recv-keys 3E1A1F401A1C47BC77D1705612D0D82387FC53B0Alternative key import options
Using the shorter key ID:
gpg --keyserver keys.openpgp.org --recv-keys 12D0D82387FC53B0Alternative keyserver (if keys.openpgp.org is unavailable):
gpg --keyserver hkp://keyserver.ubuntu.com --recv-keys 3E1A1F401A1C47BC77D1705612D0D82387FC53B0You should see output confirming the key was imported:
gpg: key 12D0D82387FC53B0: public key "MuriloChianfa <murilo.chianfa@outlook.com>" imported
gpg: Total number processed: 1
gpg: imported: 1
Verify a package signature:
gpg --verify libdynemit_1.2.0_amd64.deb.asc libdynemit_1.2.0_amd64.debIf the signature is valid, you should see:
gpg: Signature made [date and time]
gpg: using EDDSA key 3E1A1F401A1C47BC77D1705612D0D82387FC53B0
gpg: Good signature from "MuriloChianfa <murilo.chianfa@outlook.com>"
If you see "BAD signature", do not use the binary - it may have been tampered with or corrupted.
curl -LO https://github.com/MuriloChianfa/libdynemit/releases/download/v1.2.0/SHA256SUMS
curl -LO https://github.com/MuriloChianfa/libdynemit/releases/download/v1.2.0/SHA256SUMS.asc
gpg --verify SHA256SUMS.asc SHA256SUMS
sha256sum -c SHA256SUMS --ignore-missingUbuntu/Debian
# Update package list
sudo apt update
# Install GCC 13+ and CMake
sudo apt install -y gcc-13 cmake
# Verify installation
gcc --version
cmake --versionFedora/RHEL
sudo dnf install -y gcc cmakeArch Linux
sudo pacman -S gcc cmake# Clone the libdynemit project into your machine
git clone git@github.com:MuriloChianfa/libdynemit.git
cd libdynemit
# Setup the release build using all the optimizations
mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
# Compile
makeAfter building, install the library and headers system-wide:
cd build
sudo make installView installed files
Shared library:
/usr/local/lib/libdynemit.so.1.2.0(versioned shared library)/usr/local/lib/libdynemit.so.1(SONAME symlink)/usr/local/lib/libdynemit.so(development symlink)
Static libraries:
/usr/local/lib/libdynemit.a(all-in-one, includes all features)/usr/local/lib/libdynemit_core.a(just CPU detection)/usr/local/lib/libdynemit_add.a,libdynemit_sub.a,libdynemit_mul.a(vector ops)/usr/local/lib/libdynemit_sum.a,libdynemit_mean.a,libdynemit_min.a,libdynemit_max.a(basic stats)/usr/local/lib/libdynemit_variance.a,libdynemit_skewness.a,libdynemit_kurtosis.a(moments)/usr/local/lib/libdynemit_entropy.a,libdynemit_simpson.a,libdynemit_hhi.a,libdynemit_gini.a(diversity)/usr/local/lib/libdynemit_histogram.a,libdynemit_topk.a,libdynemit_hill.a,libdynemit_concentration.a(histogram & concentration)
Headers:
/usr/local/include/dynemit.h(umbrella header)/usr/local/include/dynemit/core.h(CPU detection, SIMD levels)/usr/local/include/dynemit/compiler.h(compiler portability macros)/usr/local/include/dynemit/err.h(safe IFUNC resolver utilities)/usr/local/include/dynemit/add.h,sub.h,mul.h(vector ops)/usr/local/include/dynemit/stats.h(convenience: includes all statistics headers below)/usr/local/include/dynemit/sum.h,mean.h,min.h,max.h,variance.h,skewness.h,kurtosis.h/usr/local/include/dynemit/entropy.h,simpson.h,hhi.h,gini.h/usr/local/include/dynemit/histogram.h,topk.h,hill.h,concentration.h
Build system support:
/usr/local/lib/pkgconfig/libdynemit.pc(pkg-config file)
Currently the library ships SIMD-accelerated features organized into four categories. Every function automatically dispatches to the best available instruction set at program startup.
Vector Operations
Element-wise operations on float arrays.
| Function | Description |
|---|---|
add_f32(a, b, out, n) |
out[i] = a[i] + b[i] |
sub_f32(a, b, out, n) |
out[i] = a[i] - b[i] |
mul_f32(a, b, out, n) |
out[i] = a[i] * b[i] |
Header: <dynemit/add.h>, <dynemit/sub.h>, <dynemit/mul.h>
Statistical Primitives
| Function | Description |
|---|---|
sum_f64 / sum_u64 / sum_u32 / sum_u16 |
Sum of elements |
mean_f64 / mean_u64 / mean_u32 / mean_u16 |
Arithmetic mean |
min_f64 / min_u64 / min_u32 / min_u16 |
Minimum value |
max_f64 / max_u64 / max_u32 / max_u16 |
Maximum value |
variance_f64 |
Sample variance (Bessel's correction) |
skewness_f64 |
Third standardized moment |
kurtosis_f64 |
Excess kurtosis (fourth moment - 3) |
Headers: <dynemit/sum.h>, <dynemit/mean.h>, <dynemit/min.h>, <dynemit/max.h>, <dynemit/variance.h>, <dynemit/skewness.h>, <dynemit/kurtosis.h>
Convenience header <dynemit/stats.h> includes all of the above.
Distribution & Diversity Metrics
| Function | Description |
|---|---|
entropy_u16 / entropy_u32 / entropy_histogram |
Shannon entropy (bits) |
simpson_u16 / simpson_u32 / simpson_histogram |
Simpson's diversity index |
hhi_u16 / hhi_u32 / hhi_histogram |
Herfindahl-Hirschman Index |
gini_f64 / gini_u64 |
Gini coefficient (requires sorted input) |
Headers: <dynemit/entropy.h>, <dynemit/simpson.h>, <dynemit/hhi.h>, <dynemit/gini.h>
Histogram & Concentration Analysis
| Function | Description |
|---|---|
histogram_u16 / histogram_u64 |
Count elements into boundary-defined bins |
topk_ratios_f64 |
Top-K concentration ratios from sorted descending counts |
hill_estimator_f64 |
Hill heavy-tail index estimator |
concentration_f64 |
Composite metric combining top-K, Hill, and HHI |
Headers: <dynemit/histogram.h>, <dynemit/topk.h>, <dynemit/hill.h>, <dynemit/concentration.h>
The library provides flexible usage options depending on your needs:
Option 1: All-in-One Library (Recommended for Simplicity)
Use the bundled library that includes all features:
#include <dynemit.h> // Includes core + all features
int main(void) {
const char **features = dynemit_features();
printf("Available features:\n");
for (int i = 0; features[i] != NULL; i++) {
printf(" - %s\n", features[i]);
}
simd_level_t level = detect_simd_level();
printf("SIMD level: %s\n", simd_level_name(level));
float a[1024], b[1024], result[1024];
add_f32(a, b, result, 1024);
mul_f32(a, b, result, 1024);
sub_f32(a, b, result, 1024);
double data[1024];
double avg = mean_f64(data, 1024);
double var = variance_f64(data, 1024);
return 0;
}Compile and link:
gcc -O3 myprogram.c -ldynemit -lm -o myprogramOption 2: Modular Libraries (For Minimal Binary Size)
Include only the features you need:
#include <dynemit/core.h>
#include <dynemit/add.h>
#include <dynemit/mul.h>
#include <dynemit/mean.h>
int main(void) {
simd_level_t level = detect_simd_level();
float a[1024], b[1024], result[1024];
add_f32(a, b, result, 1024);
mul_f32(a, b, result, 1024);
double data[1024];
double avg = mean_f64(data, 1024);
return 0;
}Compile and link:
gcc -O3 myprogram.c -ldynemit_core -ldynemit_add -ldynemit_mul -ldynemit_mean -lm -o myprogramOption 3: Core Only
If you only need CPU detection:
#include <dynemit/core.h>
int main(void) {
simd_level_t level = detect_simd_level();
printf("CPU supports: %s\n", simd_level_name(level));
return 0;
}Compile and link:
gcc -O3 myprogram.c -ldynemit_core -lm -o myprogramThe library is fully compatible with C++ and includes extern "C" guards in all headers. You can use it seamlessly in C++ projects:
Basic C++ Usage
#include <dynemit.h>
#include <vector>
#include <iostream>
int main() {
simd_level_t level = detect_simd_level();
std::cout << "SIMD Level: " << simd_level_name(level) << std::endl;
std::vector<float> a(1024, 1.0f);
std::vector<float> b(1024, 2.0f);
std::vector<float> result(1024);
mul_f32(a.data(), b.data(), result.data(), a.size());
add_f32(a.data(), b.data(), result.data(), a.size());
sub_f32(a.data(), b.data(), result.data(), a.size());
std::vector<double> data(1024, 3.14);
double avg = mean_f64(data.data(), data.size());
double var = variance_f64(data.data(), data.size());
return 0;
}Compile and link with g++:
g++ -std=c++17 -O3 myprogram.cpp -ldynemit -lm -o myprogramC++ with Custom IFUNC Resolvers
You can use the EXPLICIT_RUNTIME_RESOLVER macro in C++ to create your own IFUNC resolvers:
#include <dynemit/core.h>
#include <dynemit/err.h>
// Define implementations
static void my_func_scalar(float* out, const float* in, size_t n) { /* ... */ }
static void my_func_avx2(float* out, const float* in, size_t n) { /* ... */ }
static void my_func_avx512(float* out, const float* in, size_t n) { /* ... */ }
// Create resolver with C++ type safety
#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wpedantic"
EXPLICIT_RUNTIME_RESOLVER(my_func_resolver) {
simd_level_t level = detect_simd_level_ts();
switch (level) {
case SIMD_AVX512F:
return reinterpret_cast<void*>(my_func_avx512);
case SIMD_AVX2:
return reinterpret_cast<void*>(my_func_avx2);
default:
return reinterpret_cast<void*>(my_func_scalar);
}
}
#pragma GCC diagnostic pop
extern "C" void my_func(float* out, const float* in, size_t n)
__attribute__((ifunc("my_func_resolver")));Notes:
- C++17 or later is recommended for best compatibility
- All headers include proper
extern "C"linkage guards - Use
reinterpret_cast<void*>for function pointers in resolvers - The
-Wpedanticwarning about function-to-void* conversions is expected and safe for IFUNC resolvers
The detect_simd_level() function uses CPUID and XGETBV instructions to query:
- Available instruction set extensions (SSE2, SSE4.2, AVX, AVX2, AVX-512F)
- OS support for saving/restoring SIMD register state (XCR0)
simd_level_t level = detect_simd_level();
// Returns highest supported SIMD levelFor thread-safe contexts (multi-threaded code, IFUNC resolvers, dlopen()-loaded libraries), use the cached version:
simd_level_t level = detect_simd_level_ts();
// Thread-safe, cached SIMD detectionEach SIMD level has its own implementation compiled with appropriate target attributes:
__attribute__((target("avx2")))
static void mul_f32_avx2(const float *a, const float *b, float *out, size_t n)
{
// AVX2 implementation using 256-bit YMM registers
}The mul_f32() function uses the ifunc attribute to resolve to the optimal implementation:
mul_f32_func_t mul_f32_resolver(void)
{
simd_level_t level = detect_simd_level();
switch (level) {
case SIMD_AVX512F: return mul_f32_avx512f;
case SIMD_AVX2: return mul_f32_avx2;
// ... other cases
}
}
void mul_f32(const float *, const float *, float *, size_t)
__attribute__((ifunc("mul_f32_resolver")));This happens once at program load time, making subsequent calls as fast as direct function calls.
For more details on the internal architecture, see docs/ARCHITECTURE.md.
When building libraries that may be loaded via dlopen(), use the safe resolver utilities from <dynemit/err.h>:
#include <dynemit/core.h>
#include <dynemit/err.h>
EXPLICIT_RUNTIME_RESOLVER(my_function_resolver)
{
simd_level_t level = detect_simd_level_ts(); // Thread-safe!
switch (level) {
case SIMD_AVX2: return (void*)my_function_avx2;
default: return (void*)my_function_scalar;
}
}This ensures:
- Thread-safe, cached SIMD detection
- NULL-check protection (traps immediately instead of crashing later)
- Compatibility with Python's module loading
For detailed documentation, see docs/IFUNC_RESOLVERS.md.
Use the included verification script to inspect which SIMD instructions were compiled into the binary:
./scripts/check_for_simd.shThis will show:
- All function variants in the symbol table
- Actual SIMD instructions used in each implementation
- The ifunc resolver function that performs runtime dispatch
Project Structure
libdynemit/
├── CMakeLists.txt # Main CMake configuration
├── cmake/ # CMake modules
├── include/
│ ├── dynemit.h # Umbrella header (includes all features)
│ └── dynemit/
│ ├── core.h # CPU detection API
│ ├── compiler.h # Compiler portability macros (GCC/Clang)
│ ├── err.h # Safe IFUNC resolver utilities
│ ├── *.h # Features
├── src/
│ ├── CMakeLists.txt # Core library build config
│ ├── dynemit.c # CPU feature detection implementation
│ └── dynemit_features.c # Feature list for all-in-one library
├── features/ # One subdirectory per feature
│ ├── add/ # Element-wise vector addition
│ │ ├── CMakeLists.txt # Library + test + benchmark targets
│ │ ├── add_f32.c # SIMD implementations
│ │ ├── tests/* # Correctness tests
│ │ └── benchmarks/* # Benchmarks for the feature
│ └── *
├── bench/
│ ├── bench_utils.h # Shared benchmark infrastructure (header-only)
│ └── data/ # Benchmark results (CSV files)
├── tests/ # Core-only tests (SIMD detection, C++ compat)
│ ├── CMakeLists.txt
│ ├── test_*
├── docs/
│ ├── ADDING_FEATURES.md # Guide for adding new features
│ ├── ARCHITECTURE.md # Internal architecture documentation
│ ├── DEVELOPMENT.md # Development setup guide
│ ├── BENCHMARKING.md # Benchmarking and visualization guide
│ ├── IFUNC_RESOLVERS.md # IFUNC resolver safety documentation
│ └── img/ # Generated benchmark charts
├── scripts/
│ ├── check_for_simd.sh # Verify SIMD instructions in binary
│ ├── plot_benchmark.py # Generate benchmark visualization charts
│ └── requirements.txt # Python dependencies for visualization
├── mull.yml # Mull mutation testing config
└── README.md
Build Options
# Release build (default, -O3 optimization)
cmake -B build
cmake --build build -j$(nproc)
# Debug build
cmake -B build -DCMAKE_BUILD_TYPE=Debug
# List available features at configure time
cmake -B build -DLIST_FEATURES=ONRunning Tests
All C tests use the Unity framework; C++ tests use Google Test. Both are fetched automatically via CMake FetchContent.
# Run the full test suite (core + all features)
ctest --test-dir build --output-on-failure
# Run a single feature test directly
./build/features/add/test_add
./build/features/sum/test_sumEach feature has its own tests under features/<name>/tests/ that cover correctness across multiple input sizes and all reachable SIMD variants (via the _select() API). Core tests in tests/ cover SIMD detection, resolver macros, feature discovery, and C++ compatibility.
Code Coverage
Generate an HTML coverage report over all tests (requires GCC, lcov, and genhtml):
cmake -B build-cov -DCMAKE_BUILD_TYPE=Debug -DDYNEMIT_COVERAGE=ON
cmake --build build-cov -j$(nproc)
cmake --build build-cov --target coverageOpen build-cov/coverage_report/index.html in a browser. The coverage target zeroes counters, runs the full test suite, captures line/function/branch data, and filters to only project source files.
Mutation Testing
Mull injects mutations into compiled bitcode to verify test quality. Requires Clang and the mull package.
# Build with the Mull pass plugin
cmake -B build-mull -DCMAKE_C_COMPILER=clang -DDYNEMIT_MULL=ON
cmake --build build-mull -j$(nproc)
# Run mutation testing on individual test binaries
mull-runner-20 ./build-mull/features/add/test_add
mull-runner-20 ./build-mull/features/sum/test_sumThe mull.yml config at the project root controls which mutators are active.
Running Benchmarks
Each feature has its own benchmark under features/<name>/benchmarks/. Benchmarks measure single-core performance across multiple array sizes and all SIMD levels using the shared infrastructure in bench/bench_utils.h.
# Run all features x all SIMD levels, pinned to one core, max nice priority
sudo ./scripts/run_all_benchmarks.sh --cpu 15
# Or run a single feature benchmark directly
./build/features/add/bench_add
./build/features/add/bench_add --auto-detect
# Creates: bench/data/add_<cpu_model>_<simd_level>.csvRegenerate charts from existing CSV data:
bash ./scripts/run_all_benchmarks.sh --charts-onlyCreate a portable bundle for remote servers:
./scripts/bundle_benchmarks.sh --strip
# Produces: dynemit-bench-x86_64.tar.gz (static binaries)For detailed benchmarking instructions, chart naming conventions, and remote server workflow, see docs/BENCHMARKING.md.
For detailed instructions on how to add new SIMD-optimized features, see docs/ADDING_FEATURES.md.
Quick summary:
- Create feature directory:
features/my_feature/ - Add source files:
features/my_feature/my_feature_f64.c(one per type variant) - Create header:
include/dynemit/my_feature.h - Add CMakeLists.txt following the pattern:
add_library(my_feature_obj OBJECT my_feature_f64.c) target_include_directories(my_feature_obj PRIVATE ${PROJECT_SOURCE_DIR}/include) target_link_libraries(my_feature_obj PUBLIC dynemit_core) add_library(dynemit_my_feature STATIC $<TARGET_OBJECTS:my_feature_obj>) target_include_directories(dynemit_my_feature PUBLIC ${PROJECT_SOURCE_DIR}/include) target_link_libraries(dynemit_my_feature PUBLIC dynemit_core) install(TARGETS dynemit_my_feature ARCHIVE DESTINATION ${CMAKE_INSTALL_LIBDIR}) install(FILES ${PROJECT_SOURCE_DIR}/include/dynemit/my_feature.h DESTINATION ${CMAKE_INSTALL_INCLUDEDIR}/dynemit)
- Register the feature: Add it to the
features[]array insrc/dynemit_features.c - Update umbrella header: Add
#include <dynemit/my_feature.h>ininclude/dynemit.h
The build system auto-discovers features/*/ subdirectories, so no changes to the root CMakeLists.txt are needed.
Contributions are welcome! Areas for improvement:
- Additional type variants for existing features
- ARM NEON and RISC-V Vector Extension support
- AMD-specific optimizations (FMA4, XOP)
- Additional benchmarks and test cases for new features
See LICENSE file for details.