libdynemit

libdynemit leverages the ifunc resolver (supported by both GCC and Clang on Linux) to automatically select optimal SIMD implementations at program startup, delivering portable code without sacrificing performance. Thread-safe SIMD detection and dlopen-safe resolver utilities ensure robust operation in multi-threaded applications and dynamic library loading scenarios.

Example

#include <dynemit.h>

// Automatically uses AVX-512, AVX2, AVX, SSE4.2, SSE2 or scalar,
// based on your CPU's capabilities, decided once at program startup
mul_f32(a, b, result, n);
mean_f64(data, n);
entropy_u32(data, n);

Same build, best performance

Benchmark comparing vector multiplication performance across different CPU architectures using the same build binary. The library automatically detected and utilized each CPU's highest supported SIMD instruction set (AVX-512F, AVX2, AVX or SSE4.2) at runtime. Lower execution time indicates better performance. Each data point represents the median of 10 trials, with error bars showing ±1 standard deviation.

Forced SIMD instructions without dynamic dispatch

Performance scaling comparison of different SIMD instruction sets on the same CPU (AMD Ryzen 9 9950X3D). This benchmark demonstrates the progressive performance improvements from Scalar → SSE2 → SSE4.2 → AVX → AVX2 → AVX-512F. Each implementation was built and tested separately to isolate the impact of each SIMD level. The chart shows ~1.8x speedup from AVX-512F compared to scalar code for large arrays. Lower execution time indicates better performance. Each data point represents the median of 10 trials, with error bars showing ±1 standard deviation.

Installation

Option 1: Pre-built Packages

Download pre-built packages from GitHub Releases.

Debian/Ubuntu

wget https://github.com/MuriloChianfa/libdynemit/releases/download/v1.2.0/libdynemit_1.2.0_amd64.deb
sudo dpkg -i libdynemit_1.2.0_amd64.deb

Fedora/RHEL

Runtime package:

wget https://github.com/MuriloChianfa/libdynemit/releases/download/v1.2.0/libdynemit-1.2.0-1.fc40.x86_64.rpm
sudo dnf install libdynemit-1.2.0-1.fc40.x86_64.rpm

Verify GPG Signatures

All packages are cryptographically signed with GPG for authenticity verification.

Import the maintainer's public key:

gpg --keyserver keys.openpgp.org --recv-keys 3E1A1F401A1C47BC77D1705612D0D82387FC53B0

Alternative key import options

Using the shorter key ID:

gpg --keyserver keys.openpgp.org --recv-keys 12D0D82387FC53B0

Alternative keyserver (if keys.openpgp.org is unavailable):

gpg --keyserver hkp://keyserver.ubuntu.com --recv-keys 3E1A1F401A1C47BC77D1705612D0D82387FC53B0

You should see output confirming the key was imported:

gpg: key 12D0D82387FC53B0: public key "MuriloChianfa <murilo.chianfa@outlook.com>" imported
gpg: Total number processed: 1
gpg:               imported: 1

Verify a package signature:

gpg --verify libdynemit_1.2.0_amd64.deb.asc libdynemit_1.2.0_amd64.deb

If the signature is valid, you should see:

gpg: Signature made [date and time]
gpg:                using EDDSA key 3E1A1F401A1C47BC77D1705612D0D82387FC53B0
gpg: Good signature from "MuriloChianfa <murilo.chianfa@outlook.com>"

If you see "BAD signature", do not use the binary - it may have been tampered with or corrupted.

Verify Checksums

curl -LO https://github.com/MuriloChianfa/libdynemit/releases/download/v1.2.0/SHA256SUMS
curl -LO https://github.com/MuriloChianfa/libdynemit/releases/download/v1.2.0/SHA256SUMS.asc
gpg --verify SHA256SUMS.asc SHA256SUMS
sha256sum -c SHA256SUMS --ignore-missing

Option 2: Build from Source

Requirements

Ubuntu/Debian

# Update package list
sudo apt update

# Install GCC 13+ and CMake
sudo apt install -y gcc-13 cmake

# Verify installation
gcc --version
cmake --version

Fedora/RHEL

sudo dnf install -y gcc cmake

Arch Linux

sudo pacman -S gcc cmake

Build Instructions

# Clone the libdynemit project into your machine
git clone git@github.com:MuriloChianfa/libdynemit.git
cd libdynemit

# Setup the release build using all the optimizations
mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release

# Compile
make

Installing from Source

After building, install the library and headers system-wide:

cd build
sudo make install

View installed files

Shared library:

/usr/local/lib/libdynemit.so.1.2.0 (versioned shared library)
/usr/local/lib/libdynemit.so.1 (SONAME symlink)
/usr/local/lib/libdynemit.so (development symlink)

Static libraries:

/usr/local/lib/libdynemit.a (all-in-one, includes all features)
/usr/local/lib/libdynemit_core.a (just CPU detection)
/usr/local/lib/libdynemit_add.a, libdynemit_sub.a, libdynemit_mul.a (vector ops)
/usr/local/lib/libdynemit_sum.a, libdynemit_mean.a, libdynemit_min.a, libdynemit_max.a (basic stats)
/usr/local/lib/libdynemit_variance.a, libdynemit_skewness.a, libdynemit_kurtosis.a (moments)
/usr/local/lib/libdynemit_entropy.a, libdynemit_simpson.a, libdynemit_hhi.a, libdynemit_gini.a (diversity)
/usr/local/lib/libdynemit_histogram.a, libdynemit_topk.a, libdynemit_hill.a, libdynemit_concentration.a (histogram & concentration)

Headers:

/usr/local/include/dynemit.h (umbrella header)
/usr/local/include/dynemit/core.h (CPU detection, SIMD levels)
/usr/local/include/dynemit/compiler.h (compiler portability macros)
/usr/local/include/dynemit/err.h (safe IFUNC resolver utilities)
/usr/local/include/dynemit/add.h, sub.h, mul.h (vector ops)
/usr/local/include/dynemit/stats.h (convenience: includes all statistics headers below)
/usr/local/include/dynemit/sum.h, mean.h, min.h, max.h, variance.h, skewness.h, kurtosis.h
/usr/local/include/dynemit/entropy.h, simpson.h, hhi.h, gini.h
/usr/local/include/dynemit/histogram.h, topk.h, hill.h, concentration.h

Build system support:

/usr/local/lib/pkgconfig/libdynemit.pc (pkg-config file)

Features

Currently the library ships SIMD-accelerated features organized into four categories. Every function automatically dispatches to the best available instruction set at program startup.

Vector Operations

Element-wise operations on float arrays.

Function	Description
`add_f32(a, b, out, n)`	`out[i] = a[i] + b[i]`
`sub_f32(a, b, out, n)`	`out[i] = a[i] - b[i]`
`mul_f32(a, b, out, n)`	`out[i] = a[i] * b[i]`

Header: <dynemit/add.h>, <dynemit/sub.h>, <dynemit/mul.h>

Statistical Primitives

Function	Description
`sum_f64` / `sum_u64` / `sum_u32` / `sum_u16`	Sum of elements
`mean_f64` / `mean_u64` / `mean_u32` / `mean_u16`	Arithmetic mean
`min_f64` / `min_u64` / `min_u32` / `min_u16`	Minimum value
`max_f64` / `max_u64` / `max_u32` / `max_u16`	Maximum value
`variance_f64`	Sample variance (Bessel's correction)
`skewness_f64`	Third standardized moment
`kurtosis_f64`	Excess kurtosis (fourth moment - 3)

Headers: <dynemit/sum.h>, <dynemit/mean.h>, <dynemit/min.h>, <dynemit/max.h>, <dynemit/variance.h>, <dynemit/skewness.h>, <dynemit/kurtosis.h>

Convenience header <dynemit/stats.h> includes all of the above.

Distribution & Diversity Metrics

Function	Description
`entropy_u16` / `entropy_u32` / `entropy_histogram`	Shannon entropy (bits)
`simpson_u16` / `simpson_u32` / `simpson_histogram`	Simpson's diversity index
`hhi_u16` / `hhi_u32` / `hhi_histogram`	Herfindahl-Hirschman Index
`gini_f64` / `gini_u64`	Gini coefficient (requires sorted input)

Headers: <dynemit/entropy.h>, <dynemit/simpson.h>, <dynemit/hhi.h>, <dynemit/gini.h>

Histogram & Concentration Analysis

Function	Description
`histogram_u16` / `histogram_u64`	Count elements into boundary-defined bins
`topk_ratios_f64`	Top-K concentration ratios from sorted descending counts
`hill_estimator_f64`	Hill heavy-tail index estimator
`concentration_f64`	Composite metric combining top-K, Hill, and HHI

Headers: <dynemit/histogram.h>, <dynemit/topk.h>, <dynemit/hill.h>, <dynemit/concentration.h>

Library Usage Options

The library provides flexible usage options depending on your needs:

Option 1: All-in-One Library (Recommended for Simplicity)

Use the bundled library that includes all features:

#include <dynemit.h>  // Includes core + all features

int main(void) {
    const char **features = dynemit_features();
    printf("Available features:\n");
    for (int i = 0; features[i] != NULL; i++) {
        printf("  - %s\n", features[i]);
    }
    
    simd_level_t level = detect_simd_level();
    printf("SIMD level: %s\n", simd_level_name(level));
    
    float a[1024], b[1024], result[1024];
    add_f32(a, b, result, 1024);
    mul_f32(a, b, result, 1024);
    sub_f32(a, b, result, 1024);
    
    double data[1024];
    double avg = mean_f64(data, 1024);
    double var = variance_f64(data, 1024);
    
    return 0;
}

Compile and link:

gcc -O3 myprogram.c -ldynemit -lm -o myprogram

Option 2: Modular Libraries (For Minimal Binary Size)

Include only the features you need:

#include <dynemit/core.h>
#include <dynemit/add.h>
#include <dynemit/mul.h>
#include <dynemit/mean.h>

int main(void) {
    simd_level_t level = detect_simd_level();
    float a[1024], b[1024], result[1024];
    
    add_f32(a, b, result, 1024);
    mul_f32(a, b, result, 1024);
    
    double data[1024];
    double avg = mean_f64(data, 1024);
    
    return 0;
}

Compile and link:

gcc -O3 myprogram.c -ldynemit_core -ldynemit_add -ldynemit_mul -ldynemit_mean -lm -o myprogram

Option 3: Core Only

If you only need CPU detection:

#include <dynemit/core.h>

int main(void) {
    simd_level_t level = detect_simd_level();
    printf("CPU supports: %s\n", simd_level_name(level));
    return 0;
}

Compile and link:

gcc -O3 myprogram.c -ldynemit_core -lm -o myprogram

C++ Compatibility

The library is fully compatible with C++ and includes extern "C" guards in all headers. You can use it seamlessly in C++ projects:

Basic C++ Usage

#include <dynemit.h>
#include <vector>
#include <iostream>

int main() {
    simd_level_t level = detect_simd_level();
    std::cout << "SIMD Level: " << simd_level_name(level) << std::endl;
    
    std::vector<float> a(1024, 1.0f);
    std::vector<float> b(1024, 2.0f);
    std::vector<float> result(1024);
    
    mul_f32(a.data(), b.data(), result.data(), a.size());
    add_f32(a.data(), b.data(), result.data(), a.size());
    sub_f32(a.data(), b.data(), result.data(), a.size());
    
    std::vector<double> data(1024, 3.14);
    double avg = mean_f64(data.data(), data.size());
    double var = variance_f64(data.data(), data.size());
    
    return 0;
}

Compile and link with g++:

g++ -std=c++17 -O3 myprogram.cpp -ldynemit -lm -o myprogram

C++ with Custom IFUNC Resolvers

You can use the EXPLICIT_RUNTIME_RESOLVER macro in C++ to create your own IFUNC resolvers:

#include <dynemit/core.h>
#include <dynemit/err.h>

// Define implementations
static void my_func_scalar(float* out, const float* in, size_t n) { /* ... */ }
static void my_func_avx2(float* out, const float* in, size_t n) { /* ... */ }
static void my_func_avx512(float* out, const float* in, size_t n) { /* ... */ }

// Create resolver with C++ type safety
#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wpedantic"

EXPLICIT_RUNTIME_RESOLVER(my_func_resolver) {
    simd_level_t level = detect_simd_level_ts();
    
    switch (level) {
    case SIMD_AVX512F:
        return reinterpret_cast<void*>(my_func_avx512);
    case SIMD_AVX2:
        return reinterpret_cast<void*>(my_func_avx2);
    default:
        return reinterpret_cast<void*>(my_func_scalar);
    }
}

#pragma GCC diagnostic pop

extern "C" void my_func(float* out, const float* in, size_t n)
    __attribute__((ifunc("my_func_resolver")));

Notes:

C++17 or later is recommended for best compatibility
All headers include proper extern "C" linkage guards
Use reinterpret_cast<void*> for function pointers in resolvers
The -Wpedantic warning about function-to-void* conversions is expected and safe for IFUNC resolvers

Development

How It Works (Technical Details)

1. CPU Feature Detection

The detect_simd_level() function uses CPUID and XGETBV instructions to query:

Available instruction set extensions (SSE2, SSE4.2, AVX, AVX2, AVX-512F)
OS support for saving/restoring SIMD register state (XCR0)

simd_level_t level = detect_simd_level();
// Returns highest supported SIMD level

For thread-safe contexts (multi-threaded code, IFUNC resolvers, dlopen()-loaded libraries), use the cached version:

simd_level_t level = detect_simd_level_ts();
// Thread-safe, cached SIMD detection

2. Multiple SIMD Implementations

Each SIMD level has its own implementation compiled with appropriate target attributes:

__attribute__((target("avx2")))
static void mul_f32_avx2(const float *a, const float *b, float *out, size_t n)
{
    // AVX2 implementation using 256-bit YMM registers
}

3. Runtime Dispatch with ifunc

The mul_f32() function uses the ifunc attribute to resolve to the optimal implementation:

mul_f32_func_t mul_f32_resolver(void)
{
    simd_level_t level = detect_simd_level();
    switch (level) {
        case SIMD_AVX512F: return mul_f32_avx512f;
        case SIMD_AVX2:    return mul_f32_avx2;
        // ... other cases
    }
}

void mul_f32(const float *, const float *, float *, size_t)
    __attribute__((ifunc("mul_f32_resolver")));

This happens once at program load time, making subsequent calls as fast as direct function calls.

For more details on the internal architecture, see docs/ARCHITECTURE.md.

4. Safe IFUNC Resolvers (for dlopen)

When building libraries that may be loaded via dlopen(), use the safe resolver utilities from <dynemit/err.h>:

#include <dynemit/core.h>
#include <dynemit/err.h>

EXPLICIT_RUNTIME_RESOLVER(my_function_resolver)
{
    simd_level_t level = detect_simd_level_ts();  // Thread-safe!
    
    switch (level) {
        case SIMD_AVX2: return (void*)my_function_avx2;
        default:        return (void*)my_function_scalar;
    }
}

This ensures:

Thread-safe, cached SIMD detection
NULL-check protection (traps immediately instead of crashing later)
Compatibility with Python's module loading

For detailed documentation, see docs/IFUNC_RESOLVERS.md.

Verifying SIMD Instructions

Use the included verification script to inspect which SIMD instructions were compiled into the binary:

./scripts/check_for_simd.sh

This will show:

All function variants in the symbol table
Actual SIMD instructions used in each implementation
The ifunc resolver function that performs runtime dispatch

Project Structure

Project Structure

libdynemit/
├── CMakeLists.txt              # Main CMake configuration
├── cmake/                      # CMake modules
├── include/
│   ├── dynemit.h               # Umbrella header (includes all features)
│   └── dynemit/
│       ├── core.h              # CPU detection API
│       ├── compiler.h          # Compiler portability macros (GCC/Clang)
│       ├── err.h               # Safe IFUNC resolver utilities
│       ├── *.h                 # Features
├── src/
│   ├── CMakeLists.txt          # Core library build config
│   ├── dynemit.c               # CPU feature detection implementation
│   └── dynemit_features.c      # Feature list for all-in-one library
├── features/                   # One subdirectory per feature
│   ├── add/                    # Element-wise vector addition
│   │   ├── CMakeLists.txt      # Library + test + benchmark targets
│   │   ├── add_f32.c           # SIMD implementations
│   │   ├── tests/*             # Correctness tests
│   │   └── benchmarks/*        # Benchmarks for the feature
│   └── *
├── bench/
│   ├── bench_utils.h           # Shared benchmark infrastructure (header-only)
│   └── data/                   # Benchmark results (CSV files)
├── tests/                      # Core-only tests (SIMD detection, C++ compat)
│   ├── CMakeLists.txt
│   ├── test_*
├── docs/
│   ├── ADDING_FEATURES.md      # Guide for adding new features
│   ├── ARCHITECTURE.md         # Internal architecture documentation
│   ├── DEVELOPMENT.md          # Development setup guide
│   ├── BENCHMARKING.md         # Benchmarking and visualization guide
│   ├── IFUNC_RESOLVERS.md      # IFUNC resolver safety documentation
│   └── img/                    # Generated benchmark charts
├── scripts/
│   ├── check_for_simd.sh       # Verify SIMD instructions in binary
│   ├── plot_benchmark.py       # Generate benchmark visualization charts
│   └── requirements.txt        # Python dependencies for visualization
├── mull.yml                    # Mull mutation testing config
└── README.md

Build Options

# Release build (default, -O3 optimization)
cmake -B build
cmake --build build -j$(nproc)

# Debug build
cmake -B build -DCMAKE_BUILD_TYPE=Debug

# List available features at configure time
cmake -B build -DLIST_FEATURES=ON

Running Tests

All C tests use the Unity framework; C++ tests use Google Test. Both are fetched automatically via CMake FetchContent.

# Run the full test suite (core + all features)
ctest --test-dir build --output-on-failure

# Run a single feature test directly
./build/features/add/test_add
./build/features/sum/test_sum

Each feature has its own tests under features/<name>/tests/ that cover correctness across multiple input sizes and all reachable SIMD variants (via the _select() API). Core tests in tests/ cover SIMD detection, resolver macros, feature discovery, and C++ compatibility.

Code Coverage

Generate an HTML coverage report over all tests (requires GCC, lcov, and genhtml):

cmake -B build-cov -DCMAKE_BUILD_TYPE=Debug -DDYNEMIT_COVERAGE=ON
cmake --build build-cov -j$(nproc)
cmake --build build-cov --target coverage

Open build-cov/coverage_report/index.html in a browser. The coverage target zeroes counters, runs the full test suite, captures line/function/branch data, and filters to only project source files.

Mutation Testing

Mull injects mutations into compiled bitcode to verify test quality. Requires Clang and the mull package.

# Build with the Mull pass plugin
cmake -B build-mull -DCMAKE_C_COMPILER=clang -DDYNEMIT_MULL=ON
cmake --build build-mull -j$(nproc)

# Run mutation testing on individual test binaries
mull-runner-20 ./build-mull/features/add/test_add
mull-runner-20 ./build-mull/features/sum/test_sum

The mull.yml config at the project root controls which mutators are active.

Running Benchmarks

Each feature has its own benchmark under features/<name>/benchmarks/. Benchmarks measure single-core performance across multiple array sizes and all SIMD levels using the shared infrastructure in bench/bench_utils.h.

# Run all features x all SIMD levels, pinned to one core, max nice priority
sudo ./scripts/run_all_benchmarks.sh --cpu 15

# Or run a single feature benchmark directly
./build/features/add/bench_add
./build/features/add/bench_add --auto-detect
# Creates: bench/data/add_<cpu_model>_<simd_level>.csv

Regenerate charts from existing CSV data:

bash ./scripts/run_all_benchmarks.sh --charts-only

Create a portable bundle for remote servers:

./scripts/bundle_benchmarks.sh --strip
# Produces: dynemit-bench-x86_64.tar.gz (static binaries)

For detailed benchmarking instructions, chart naming conventions, and remote server workflow, see docs/BENCHMARKING.md.

Adding New Features

For detailed instructions on how to add new SIMD-optimized features, see docs/ADDING_FEATURES.md.

Quick summary:

Create feature directory: features/my_feature/
Add source files: features/my_feature/my_feature_f64.c (one per type variant)
Create header: include/dynemit/my_feature.h

Add CMakeLists.txt following the pattern:

add_library(my_feature_obj OBJECT my_feature_f64.c)
target_include_directories(my_feature_obj PRIVATE ${PROJECT_SOURCE_DIR}/include)
target_link_libraries(my_feature_obj PUBLIC dynemit_core)

add_library(dynemit_my_feature STATIC $<TARGET_OBJECTS:my_feature_obj>)
target_include_directories(dynemit_my_feature PUBLIC ${PROJECT_SOURCE_DIR}/include)
target_link_libraries(dynemit_my_feature PUBLIC dynemit_core)

install(TARGETS dynemit_my_feature ARCHIVE DESTINATION ${CMAKE_INSTALL_LIBDIR})
install(FILES ${PROJECT_SOURCE_DIR}/include/dynemit/my_feature.h 
        DESTINATION ${CMAKE_INSTALL_INCLUDEDIR}/dynemit)

Register the feature: Add it to the features[] array in src/dynemit_features.c
Update umbrella header: Add #include <dynemit/my_feature.h> in include/dynemit.h

The build system auto-discovers features/*/ subdirectories, so no changes to the root CMakeLists.txt are needed.

Contributing

Contributions are welcome! Areas for improvement:

Additional type variants for existing features
ARM NEON and RISC-V Vector Extension support
AMD-specific optimizations (FMA4, XOP)
Additional benchmarks and test cases for new features

License

See LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.github		.github
bench		bench
cmake		cmake
docs		docs
features		features
include		include
pkg		pkg
scripts		scripts
src		src
tests		tests
.editorconfig		.editorconfig
.gitignore		.gitignore
CITATION.cff		CITATION.cff
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
libdynemit.pc.in		libdynemit.pc.in
mull.yml		mull.yml

Uh oh!

License

MuriloChianfa/libdynemit

Folders and files

Latest commit

History

Repository files navigation

libdynemit

Example

Same build, best performance

Forced SIMD instructions without dynamic dispatch

Installation

Option 1: Pre-built Packages

Verify GPG Signatures

Verify Checksums

Option 2: Build from Source

Requirements

Build Instructions

Installing from Source

Features

Library Usage Options

C++ Compatibility

Development

How It Works (Technical Details)

1. CPU Feature Detection

2. Multiple SIMD Implementations

3. Runtime Dispatch with ifunc

4. Safe IFUNC Resolvers (for dlopen)

Verifying SIMD Instructions

Project Structure

Adding New Features

Contributing

License

References

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 3

Sponsor this project

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages