Skip to content

Automatically select the optimal implementation at program startup, ts and dl safe options are available.

License

Notifications You must be signed in to change notification settings

MuriloChianfa/libdynemit

libdynemit

C23 codecov CMake GCC Clang License: Boost

libdynemit leverages the ifunc resolver (supported by both GCC and Clang on Linux) to automatically select optimal SIMD implementations at program startup, delivering portable code without sacrificing performance. Thread-safe SIMD detection and dlopen-safe resolver utilities ensure robust operation in multi-threaded applications and dynamic library loading scenarios.

Example

#include <dynemit.h>

// Automatically uses AVX-512, AVX2, AVX, SSE4.2, SSE2 or scalar,
// based on your CPU's capabilities, decided once at program startup
mul_f32(a, b, result, n);
mean_f64(data, n);
entropy_u32(data, n);

Same build, best performance

Vector Multiply Benchmark Benchmark comparing vector multiplication performance across different CPU architectures using the same build binary. The library automatically detected and utilized each CPU's highest supported SIMD instruction set (AVX-512F, AVX2, AVX or SSE4.2) at runtime. Lower execution time indicates better performance. Each data point represents the median of 10 trials, with error bars showing ±1 standard deviation.

Forced SIMD instructions without dynamic dispatch

SIMD Feature Comparison Performance scaling comparison of different SIMD instruction sets on the same CPU (AMD Ryzen 9 9950X3D). This benchmark demonstrates the progressive performance improvements from Scalar → SSE2 → SSE4.2 → AVX → AVX2 → AVX-512F. Each implementation was built and tested separately to isolate the impact of each SIMD level. The chart shows ~1.8x speedup from AVX-512F compared to scalar code for large arrays. Lower execution time indicates better performance. Each data point represents the median of 10 trials, with error bars showing ±1 standard deviation.

Installation

Option 1: Pre-built Packages

Download pre-built packages from GitHub Releases.

Debian/Ubuntu
wget https://github.com/MuriloChianfa/libdynemit/releases/download/v1.2.0/libdynemit_1.2.0_amd64.deb
sudo dpkg -i libdynemit_1.2.0_amd64.deb
Fedora/RHEL

Runtime package:

wget https://github.com/MuriloChianfa/libdynemit/releases/download/v1.2.0/libdynemit-1.2.0-1.fc40.x86_64.rpm
sudo dnf install libdynemit-1.2.0-1.fc40.x86_64.rpm

Verify GPG Signatures

All packages are cryptographically signed with GPG for authenticity verification.

Import the maintainer's public key:

gpg --keyserver keys.openpgp.org --recv-keys 3E1A1F401A1C47BC77D1705612D0D82387FC53B0
Alternative key import options

Using the shorter key ID:

gpg --keyserver keys.openpgp.org --recv-keys 12D0D82387FC53B0

Alternative keyserver (if keys.openpgp.org is unavailable):

gpg --keyserver hkp://keyserver.ubuntu.com --recv-keys 3E1A1F401A1C47BC77D1705612D0D82387FC53B0

You should see output confirming the key was imported:

gpg: key 12D0D82387FC53B0: public key "MuriloChianfa <murilo.chianfa@outlook.com>" imported
gpg: Total number processed: 1
gpg:               imported: 1

Verify a package signature:

gpg --verify libdynemit_1.2.0_amd64.deb.asc libdynemit_1.2.0_amd64.deb

If the signature is valid, you should see:

gpg: Signature made [date and time]
gpg:                using EDDSA key 3E1A1F401A1C47BC77D1705612D0D82387FC53B0
gpg: Good signature from "MuriloChianfa <murilo.chianfa@outlook.com>"

If you see "BAD signature", do not use the binary - it may have been tampered with or corrupted.

Verify Checksums

curl -LO https://github.com/MuriloChianfa/libdynemit/releases/download/v1.2.0/SHA256SUMS
curl -LO https://github.com/MuriloChianfa/libdynemit/releases/download/v1.2.0/SHA256SUMS.asc
gpg --verify SHA256SUMS.asc SHA256SUMS
sha256sum -c SHA256SUMS --ignore-missing

Option 2: Build from Source

Requirements

Ubuntu/Debian
# Update package list
sudo apt update

# Install GCC 13+ and CMake
sudo apt install -y gcc-13 cmake

# Verify installation
gcc --version
cmake --version
Fedora/RHEL
sudo dnf install -y gcc cmake
Arch Linux
sudo pacman -S gcc cmake

Build Instructions

# Clone the libdynemit project into your machine
git clone git@github.com:MuriloChianfa/libdynemit.git
cd libdynemit

# Setup the release build using all the optimizations
mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release

# Compile
make

Installing from Source

After building, install the library and headers system-wide:

cd build
sudo make install
View installed files

Shared library:

  • /usr/local/lib/libdynemit.so.1.2.0 (versioned shared library)
  • /usr/local/lib/libdynemit.so.1 (SONAME symlink)
  • /usr/local/lib/libdynemit.so (development symlink)

Static libraries:

  • /usr/local/lib/libdynemit.a (all-in-one, includes all features)
  • /usr/local/lib/libdynemit_core.a (just CPU detection)
  • /usr/local/lib/libdynemit_add.a, libdynemit_sub.a, libdynemit_mul.a (vector ops)
  • /usr/local/lib/libdynemit_sum.a, libdynemit_mean.a, libdynemit_min.a, libdynemit_max.a (basic stats)
  • /usr/local/lib/libdynemit_variance.a, libdynemit_skewness.a, libdynemit_kurtosis.a (moments)
  • /usr/local/lib/libdynemit_entropy.a, libdynemit_simpson.a, libdynemit_hhi.a, libdynemit_gini.a (diversity)
  • /usr/local/lib/libdynemit_histogram.a, libdynemit_topk.a, libdynemit_hill.a, libdynemit_concentration.a (histogram & concentration)

Headers:

  • /usr/local/include/dynemit.h (umbrella header)
  • /usr/local/include/dynemit/core.h (CPU detection, SIMD levels)
  • /usr/local/include/dynemit/compiler.h (compiler portability macros)
  • /usr/local/include/dynemit/err.h (safe IFUNC resolver utilities)
  • /usr/local/include/dynemit/add.h, sub.h, mul.h (vector ops)
  • /usr/local/include/dynemit/stats.h (convenience: includes all statistics headers below)
  • /usr/local/include/dynemit/sum.h, mean.h, min.h, max.h, variance.h, skewness.h, kurtosis.h
  • /usr/local/include/dynemit/entropy.h, simpson.h, hhi.h, gini.h
  • /usr/local/include/dynemit/histogram.h, topk.h, hill.h, concentration.h

Build system support:

  • /usr/local/lib/pkgconfig/libdynemit.pc (pkg-config file)

Features

Currently the library ships SIMD-accelerated features organized into four categories. Every function automatically dispatches to the best available instruction set at program startup.

Vector Operations

Element-wise operations on float arrays.

Function Description
add_f32(a, b, out, n) out[i] = a[i] + b[i]
sub_f32(a, b, out, n) out[i] = a[i] - b[i]
mul_f32(a, b, out, n) out[i] = a[i] * b[i]

Header: <dynemit/add.h>, <dynemit/sub.h>, <dynemit/mul.h>

Statistical Primitives
Function Description
sum_f64 / sum_u64 / sum_u32 / sum_u16 Sum of elements
mean_f64 / mean_u64 / mean_u32 / mean_u16 Arithmetic mean
min_f64 / min_u64 / min_u32 / min_u16 Minimum value
max_f64 / max_u64 / max_u32 / max_u16 Maximum value
variance_f64 Sample variance (Bessel's correction)
skewness_f64 Third standardized moment
kurtosis_f64 Excess kurtosis (fourth moment - 3)

Headers: <dynemit/sum.h>, <dynemit/mean.h>, <dynemit/min.h>, <dynemit/max.h>, <dynemit/variance.h>, <dynemit/skewness.h>, <dynemit/kurtosis.h>

Convenience header <dynemit/stats.h> includes all of the above.

Distribution & Diversity Metrics
Function Description
entropy_u16 / entropy_u32 / entropy_histogram Shannon entropy (bits)
simpson_u16 / simpson_u32 / simpson_histogram Simpson's diversity index
hhi_u16 / hhi_u32 / hhi_histogram Herfindahl-Hirschman Index
gini_f64 / gini_u64 Gini coefficient (requires sorted input)

Headers: <dynemit/entropy.h>, <dynemit/simpson.h>, <dynemit/hhi.h>, <dynemit/gini.h>

Histogram & Concentration Analysis
Function Description
histogram_u16 / histogram_u64 Count elements into boundary-defined bins
topk_ratios_f64 Top-K concentration ratios from sorted descending counts
hill_estimator_f64 Hill heavy-tail index estimator
concentration_f64 Composite metric combining top-K, Hill, and HHI

Headers: <dynemit/histogram.h>, <dynemit/topk.h>, <dynemit/hill.h>, <dynemit/concentration.h>

Library Usage Options

The library provides flexible usage options depending on your needs:

Option 1: All-in-One Library (Recommended for Simplicity)

Use the bundled library that includes all features:

#include <dynemit.h>  // Includes core + all features

int main(void) {
    const char **features = dynemit_features();
    printf("Available features:\n");
    for (int i = 0; features[i] != NULL; i++) {
        printf("  - %s\n", features[i]);
    }
    
    simd_level_t level = detect_simd_level();
    printf("SIMD level: %s\n", simd_level_name(level));
    
    float a[1024], b[1024], result[1024];
    add_f32(a, b, result, 1024);
    mul_f32(a, b, result, 1024);
    sub_f32(a, b, result, 1024);
    
    double data[1024];
    double avg = mean_f64(data, 1024);
    double var = variance_f64(data, 1024);
    
    return 0;
}

Compile and link:

gcc -O3 myprogram.c -ldynemit -lm -o myprogram
Option 2: Modular Libraries (For Minimal Binary Size)

Include only the features you need:

#include <dynemit/core.h>
#include <dynemit/add.h>
#include <dynemit/mul.h>
#include <dynemit/mean.h>

int main(void) {
    simd_level_t level = detect_simd_level();
    float a[1024], b[1024], result[1024];
    
    add_f32(a, b, result, 1024);
    mul_f32(a, b, result, 1024);
    
    double data[1024];
    double avg = mean_f64(data, 1024);
    
    return 0;
}

Compile and link:

gcc -O3 myprogram.c -ldynemit_core -ldynemit_add -ldynemit_mul -ldynemit_mean -lm -o myprogram
Option 3: Core Only

If you only need CPU detection:

#include <dynemit/core.h>

int main(void) {
    simd_level_t level = detect_simd_level();
    printf("CPU supports: %s\n", simd_level_name(level));
    return 0;
}

Compile and link:

gcc -O3 myprogram.c -ldynemit_core -lm -o myprogram

C++ Compatibility

The library is fully compatible with C++ and includes extern "C" guards in all headers. You can use it seamlessly in C++ projects:

Basic C++ Usage
#include <dynemit.h>
#include <vector>
#include <iostream>

int main() {
    simd_level_t level = detect_simd_level();
    std::cout << "SIMD Level: " << simd_level_name(level) << std::endl;
    
    std::vector<float> a(1024, 1.0f);
    std::vector<float> b(1024, 2.0f);
    std::vector<float> result(1024);
    
    mul_f32(a.data(), b.data(), result.data(), a.size());
    add_f32(a.data(), b.data(), result.data(), a.size());
    sub_f32(a.data(), b.data(), result.data(), a.size());
    
    std::vector<double> data(1024, 3.14);
    double avg = mean_f64(data.data(), data.size());
    double var = variance_f64(data.data(), data.size());
    
    return 0;
}

Compile and link with g++:

g++ -std=c++17 -O3 myprogram.cpp -ldynemit -lm -o myprogram
C++ with Custom IFUNC Resolvers

You can use the EXPLICIT_RUNTIME_RESOLVER macro in C++ to create your own IFUNC resolvers:

#include <dynemit/core.h>
#include <dynemit/err.h>

// Define implementations
static void my_func_scalar(float* out, const float* in, size_t n) { /* ... */ }
static void my_func_avx2(float* out, const float* in, size_t n) { /* ... */ }
static void my_func_avx512(float* out, const float* in, size_t n) { /* ... */ }

// Create resolver with C++ type safety
#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wpedantic"

EXPLICIT_RUNTIME_RESOLVER(my_func_resolver) {
    simd_level_t level = detect_simd_level_ts();
    
    switch (level) {
    case SIMD_AVX512F:
        return reinterpret_cast<void*>(my_func_avx512);
    case SIMD_AVX2:
        return reinterpret_cast<void*>(my_func_avx2);
    default:
        return reinterpret_cast<void*>(my_func_scalar);
    }
}

#pragma GCC diagnostic pop

extern "C" void my_func(float* out, const float* in, size_t n)
    __attribute__((ifunc("my_func_resolver")));

Notes:

  • C++17 or later is recommended for best compatibility
  • All headers include proper extern "C" linkage guards
  • Use reinterpret_cast<void*> for function pointers in resolvers
  • The -Wpedantic warning about function-to-void* conversions is expected and safe for IFUNC resolvers

Development

How It Works (Technical Details)

1. CPU Feature Detection

The detect_simd_level() function uses CPUID and XGETBV instructions to query:

  • Available instruction set extensions (SSE2, SSE4.2, AVX, AVX2, AVX-512F)
  • OS support for saving/restoring SIMD register state (XCR0)
simd_level_t level = detect_simd_level();
// Returns highest supported SIMD level

For thread-safe contexts (multi-threaded code, IFUNC resolvers, dlopen()-loaded libraries), use the cached version:

simd_level_t level = detect_simd_level_ts();
// Thread-safe, cached SIMD detection

2. Multiple SIMD Implementations

Each SIMD level has its own implementation compiled with appropriate target attributes:

__attribute__((target("avx2")))
static void mul_f32_avx2(const float *a, const float *b, float *out, size_t n)
{
    // AVX2 implementation using 256-bit YMM registers
}

3. Runtime Dispatch with ifunc

The mul_f32() function uses the ifunc attribute to resolve to the optimal implementation:

mul_f32_func_t mul_f32_resolver(void)
{
    simd_level_t level = detect_simd_level();
    switch (level) {
        case SIMD_AVX512F: return mul_f32_avx512f;
        case SIMD_AVX2:    return mul_f32_avx2;
        // ... other cases
    }
}

void mul_f32(const float *, const float *, float *, size_t)
    __attribute__((ifunc("mul_f32_resolver")));

This happens once at program load time, making subsequent calls as fast as direct function calls.

For more details on the internal architecture, see docs/ARCHITECTURE.md.

4. Safe IFUNC Resolvers (for dlopen)

When building libraries that may be loaded via dlopen(), use the safe resolver utilities from <dynemit/err.h>:

#include <dynemit/core.h>
#include <dynemit/err.h>

EXPLICIT_RUNTIME_RESOLVER(my_function_resolver)
{
    simd_level_t level = detect_simd_level_ts();  // Thread-safe!
    
    switch (level) {
        case SIMD_AVX2: return (void*)my_function_avx2;
        default:        return (void*)my_function_scalar;
    }
}

This ensures:

  • Thread-safe, cached SIMD detection
  • NULL-check protection (traps immediately instead of crashing later)
  • Compatibility with Python's module loading

For detailed documentation, see docs/IFUNC_RESOLVERS.md.

Verifying SIMD Instructions

Use the included verification script to inspect which SIMD instructions were compiled into the binary:

./scripts/check_for_simd.sh

This will show:

  • All function variants in the symbol table
  • Actual SIMD instructions used in each implementation
  • The ifunc resolver function that performs runtime dispatch
Project Structure

Project Structure

libdynemit/
├── CMakeLists.txt              # Main CMake configuration
├── cmake/                      # CMake modules
├── include/
│   ├── dynemit.h               # Umbrella header (includes all features)
│   └── dynemit/
│       ├── core.h              # CPU detection API
│       ├── compiler.h          # Compiler portability macros (GCC/Clang)
│       ├── err.h               # Safe IFUNC resolver utilities
│       ├── *.h                 # Features
├── src/
│   ├── CMakeLists.txt          # Core library build config
│   ├── dynemit.c               # CPU feature detection implementation
│   └── dynemit_features.c      # Feature list for all-in-one library
├── features/                   # One subdirectory per feature
│   ├── add/                    # Element-wise vector addition
│   │   ├── CMakeLists.txt      # Library + test + benchmark targets
│   │   ├── add_f32.c           # SIMD implementations
│   │   ├── tests/*             # Correctness tests
│   │   └── benchmarks/*        # Benchmarks for the feature
│   └── *
├── bench/
│   ├── bench_utils.h           # Shared benchmark infrastructure (header-only)
│   └── data/                   # Benchmark results (CSV files)
├── tests/                      # Core-only tests (SIMD detection, C++ compat)
│   ├── CMakeLists.txt
│   ├── test_*
├── docs/
│   ├── ADDING_FEATURES.md      # Guide for adding new features
│   ├── ARCHITECTURE.md         # Internal architecture documentation
│   ├── DEVELOPMENT.md          # Development setup guide
│   ├── BENCHMARKING.md         # Benchmarking and visualization guide
│   ├── IFUNC_RESOLVERS.md      # IFUNC resolver safety documentation
│   └── img/                    # Generated benchmark charts
├── scripts/
│   ├── check_for_simd.sh       # Verify SIMD instructions in binary
│   ├── plot_benchmark.py       # Generate benchmark visualization charts
│   └── requirements.txt        # Python dependencies for visualization
├── mull.yml                    # Mull mutation testing config
└── README.md
Build Options
# Release build (default, -O3 optimization)
cmake -B build
cmake --build build -j$(nproc)

# Debug build
cmake -B build -DCMAKE_BUILD_TYPE=Debug

# List available features at configure time
cmake -B build -DLIST_FEATURES=ON
Running Tests

All C tests use the Unity framework; C++ tests use Google Test. Both are fetched automatically via CMake FetchContent.

# Run the full test suite (core + all features)
ctest --test-dir build --output-on-failure

# Run a single feature test directly
./build/features/add/test_add
./build/features/sum/test_sum

Each feature has its own tests under features/<name>/tests/ that cover correctness across multiple input sizes and all reachable SIMD variants (via the _select() API). Core tests in tests/ cover SIMD detection, resolver macros, feature discovery, and C++ compatibility.

Code Coverage

Generate an HTML coverage report over all tests (requires GCC, lcov, and genhtml):

cmake -B build-cov -DCMAKE_BUILD_TYPE=Debug -DDYNEMIT_COVERAGE=ON
cmake --build build-cov -j$(nproc)
cmake --build build-cov --target coverage

Open build-cov/coverage_report/index.html in a browser. The coverage target zeroes counters, runs the full test suite, captures line/function/branch data, and filters to only project source files.

Mutation Testing

Mull injects mutations into compiled bitcode to verify test quality. Requires Clang and the mull package.

# Build with the Mull pass plugin
cmake -B build-mull -DCMAKE_C_COMPILER=clang -DDYNEMIT_MULL=ON
cmake --build build-mull -j$(nproc)

# Run mutation testing on individual test binaries
mull-runner-20 ./build-mull/features/add/test_add
mull-runner-20 ./build-mull/features/sum/test_sum

The mull.yml config at the project root controls which mutators are active.

Running Benchmarks

Each feature has its own benchmark under features/<name>/benchmarks/. Benchmarks measure single-core performance across multiple array sizes and all SIMD levels using the shared infrastructure in bench/bench_utils.h.

# Run all features x all SIMD levels, pinned to one core, max nice priority
sudo ./scripts/run_all_benchmarks.sh --cpu 15

# Or run a single feature benchmark directly
./build/features/add/bench_add
./build/features/add/bench_add --auto-detect
# Creates: bench/data/add_<cpu_model>_<simd_level>.csv

Regenerate charts from existing CSV data:

bash ./scripts/run_all_benchmarks.sh --charts-only

Create a portable bundle for remote servers:

./scripts/bundle_benchmarks.sh --strip
# Produces: dynemit-bench-x86_64.tar.gz (static binaries)

For detailed benchmarking instructions, chart naming conventions, and remote server workflow, see docs/BENCHMARKING.md.


Adding New Features

For detailed instructions on how to add new SIMD-optimized features, see docs/ADDING_FEATURES.md.

Quick summary:

  1. Create feature directory: features/my_feature/
  2. Add source files: features/my_feature/my_feature_f64.c (one per type variant)
  3. Create header: include/dynemit/my_feature.h
  4. Add CMakeLists.txt following the pattern:
    add_library(my_feature_obj OBJECT my_feature_f64.c)
    target_include_directories(my_feature_obj PRIVATE ${PROJECT_SOURCE_DIR}/include)
    target_link_libraries(my_feature_obj PUBLIC dynemit_core)
    
    add_library(dynemit_my_feature STATIC $<TARGET_OBJECTS:my_feature_obj>)
    target_include_directories(dynemit_my_feature PUBLIC ${PROJECT_SOURCE_DIR}/include)
    target_link_libraries(dynemit_my_feature PUBLIC dynemit_core)
    
    install(TARGETS dynemit_my_feature ARCHIVE DESTINATION ${CMAKE_INSTALL_LIBDIR})
    install(FILES ${PROJECT_SOURCE_DIR}/include/dynemit/my_feature.h 
            DESTINATION ${CMAKE_INSTALL_INCLUDEDIR}/dynemit)
  5. Register the feature: Add it to the features[] array in src/dynemit_features.c
  6. Update umbrella header: Add #include <dynemit/my_feature.h> in include/dynemit.h

The build system auto-discovers features/*/ subdirectories, so no changes to the root CMakeLists.txt are needed.

Contributing

Contributions are welcome! Areas for improvement:

  • Additional type variants for existing features
  • ARM NEON and RISC-V Vector Extension support
  • AMD-specific optimizations (FMA4, XOP)
  • Additional benchmarks and test cases for new features

License

See LICENSE file for details.

References

About

Automatically select the optimal implementation at program startup, ts and dl safe options are available.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Sponsor this project

 

Contributors