Tags · ashvardanian/NumKong

v7.6.0

Release: v7.6.0 [skip ci]

- Add: DLPack 1.3 interop bridge for numkong.Tensor (ea74fe1)
- Add: Back-port tensor API to C++20 for CUDA (ad93068)

- Improve: FP8 GEMM throughput on Skylake/Haswell + Granite Rapids E5M2 kernel (c19bec9)
- Improve: FP8 pairwise distance kernels via Giesen trick + F16 widen path (679f55f)
- Fix: Keep `*_serial` kernels scalar across LTO (455d535)
- Make: Enable symbol exports for `nk_shared` Emscripten builds (482e4fd)
- Improve: SSD trace-identity fold across all mesh backends + Genoa/NEONFHM kernels (e9d40e5)
- Make: Normalize base PowerPC & LoongArch cap for JS (ab81191)

Apr 20, 2026
48cbd21
zip
tar.gz
Notes
Downloads

v7.5.0

Release: v7.5.0 [skip ci]

- Add: NEON popcount kernel for nk_reduce_moments_u1 (2181e0c)
- Add: Tensor constructors, sealed trait family, div_ceil cleanup (2792279)
- Add: Span-based matrix `_into` APIs, parallel Hammings/Jaccards, full-crate docs (99289df)
- Add: OpenMP for Python & JavaScript (499ecc9)
- Add: Granite Rapids AMX for F16 & F32 (28036ea)

- Fix: Native ISA probe on Apple Clang + compile/runtime glyph (bc13e02)
- Make: Detect illegal instructions in macOS CI (289cdaf)
- Fix: Drop `-march=` on macOS setup.py builds (28aac74)
- Fix: Exclude `std::signal` from WASM builds (14814c5)
- Improve: Drop GNU statement-expression macros in SVE reduce helpers (b8b4ca0)
- Make: Drop `+nosimd` from AArch64 baseline (23f5195)
- Make: Forbid auto-vectorization in portable baseline builds (43e8324)
- Make: Pin TU baseline to per-arch ABI floor across build systems (453ed5f)
- Fix: Mitigate GCC 13 wrong BF16 splat in Arm NEON (#346) (fc3d8ec)
- Improve: Log faulting capability detection (a401f8a)
- Improve: Log faulting kernel on fatal signals in `nk_test` (22c7c79)
- Make: Normalize Python test dependencies across CI and docs (8a0f3d4)
- Make: Baseline-only ISA for shared-library test, harden Windows CI (1907685)
- Fix: Wrong compiler probes for SMEBF16 & SMEBI32 (8b19ddb)
- Make: Log host CPU capabilities in macOS and Windows CI jobs (988eeb2)
- Fix: Pre-declare OpenMP loop counter, universal libomp for macOS (493a021)
- Fix: Use int for OpenMP loop counters, absolute libomp install name (ccc0118)
- Fix: GCC requires +sme prefix in target attribute for __arm_sc_* stubs (291dc0a)
- Fix: Signed OpenMP iterators, source-built libomp, JS KMP guard (dc1ae75)
- Fix: OpenMP wheel builds on macOS and Windows (f569121)
- Fix: Add target("sme") to __arm_sc_* stubs for GCC compatibility (ad2add0)
- Fix: Unpoison SVE scalar reductions for MemorySanitizer (#342) (b42eda7)
- Improve: Move SME runtime stubs to types.h as weak inline definitions (64ca934)
- Improve: Manual SME streaming control, single enter/exit per API call (6432837)
- Fix: Update `cdist` edge-case test for re-added `threads=` kwarg (50681af)
- Make: Allow force-enabling ISA targets via environment variables (0e58702)
- Improve: Abandon F32→F64 via Ozaki on Granite Rapids (94a5f19)
- Make: FreeBSD, PPC64le, LoongArch, RISC-V releases & compress Windows (a9a0d83)
- Make: Standardize CI compilers and add Windows test job (9a22ea4)
- Make: Shrink serial fallbacks with scoped size optimization (83154a8)
- Make: Compress Windows builds (e30ad3d)
- Fix: Streaming-compatible stubs for LLVM SME builds (0be7b2f)

Apr 14, 2026
14daf40
zip
tar.gz
Notes
Downloads

v7.4.5

Release: v7.4.5 [skip ci]

- Improve: Vectorize F32 SME MaxSim finalizer (0daacf3)
- Improve: Remove centering from RMSD kernels (1a83ab4)
- Fix: Emulated vs native test durations (4266451)

Apr 6, 2026
a750052
zip
tar.gz
Notes
Downloads

v7.4.4

Release: v7.4.4 [skip ci]

- Fix: ARMv7 Rust cross-compilation with CC for versioned GCC (a5e67e6)
- Make: `check_source_runs`-probing like `march=native` on MSVC (7a152f3)
- Fix: Drop `_MM_FROUND_NO_EXC` from `_mm256_cvtps_ph` calls (8649b0c)
- Fix: Guard against old MSVC preprocessor (25d3304)
- Make: Enforce newer preprocessor in MSVC (be966af)
- Make: Cleaner CIBW artifact names & env forwarding (a6cf642)
- Make: Forward cross-compilation flags for macOS wheels (6ed3b8c)
- Make: Split ppc64le, s390x, i686 CIBW runs (c01795c)

Apr 6, 2026
b4ed3ae
zip
tar.gz
Notes
Downloads

v7.4.3

Release: v7.4.3 [skip ci]

- Fix: Require AArch64 for NEON kernels (2ba1b34)
- Docs: Table order & formatting (8673a56)
- Make: Avoid `--all-features` in Rust cross-compilation CI (8be8bff)
- Improve: Arm32 compatibility (6404172)
- Make: `cancel-in-progress` CI to shift compute resources (dfc8fa0)
- Improve: Harden Swift SDK for 6.1+ toolkit (965cd52)
- Make: Strip `.unsafeFlags` & list platforms for SPM consumption (b061b78)
- Make: Expose `CNumKongDispatch` target to Swift users (6aa00a8)

Apr 5, 2026
55fc1d8
zip
tar.gz
Notes
Downloads

v7.4.2

Release: v7.4.2 [skip ci]

- Docs: Shrink tables in the main README (6d2ea34)
- Make: Inline Power Shell cross-compilation logic in CI (974c30c)
- Make: Define `_ARM64_` for Arm JS builds in MSVC (f303042)
- Make: Skip same-named artifacts on CI reruns (7c098e5)

Apr 5, 2026
0f2783c
zip
tar.gz
Notes
Downloads

v7.4.1

Release: v7.4.1 [skip ci]

- Make: Set `repository.url` for NPM (385480d)
- Make: Pull MSVC ARM64 Cross-Compiler (e20c93e)
- Fix: Swap `f16x8` for `u16x8` in `cast_neon` (154ec5d)

Apr 5, 2026
c360304
zip
tar.gz
Notes
Downloads

v7.4.0

Release: v7.4.0 [skip ci]

- Add: Was elementwise ops & spatial mini-float kernels (81b8c44)
- Add: WASM type-casting kernels (e09df31)
- Add: SVE+SDOT ops for 8-bit integers (913fc6b)

- Fix: Misplaced NEON loads/stores in Sierra (05e3045)
- Fix: Avoid unconsitional `np` symbols (9dffb68)
- Make: Resolve probe locations for NPM consumers (c602f45)
- Docs: Refined "What's Inside" (28f35cd)
- Docs: Mini-float kernel selection strategy (04e6598)
- Improve: Accelerate PyTests, reduce `Decimal` use (2417248)
- Make: Move `.pyi` for PyLance (688ec2d)
- Fix: Inconsistent SME function qualifiers (5b4148a)
- Improve: Smaller test inputs under QEMU (ee36bf2)
- Improve: Vectorize GEMM "packers" (86127a4)
- Make: Longer timeouts for QEMU in CI (a9cc732)
- Fix: `vec_t` store helper args order (eecbcac)
- Fix: Negative stride tensor reductions (3ea81be)
- Improve: Recursive stride collapsing and axis-lane fast paths for N-D reductions (cf8eaf6)
- Improve: Faster reductions in strided tensors (61651ed)
- Improve: Wider NEON curved, mesh, & probability F16 kernels (1c17678)
- Fix: Harden mini-float type-casting (1911b89)
- Make: Ship `win32-arm64` NPM builds (578b7ad)
- Make: Auto-bump JS platform-specific versions (5617f75)
- Fix: `vcombine` instead of initializer lists for NEON arrays in MSVC (906c178)
- Fix: Avoid flaky `vld1_f16` for MSVC (7a987d2)

Apr 4, 2026
ffc6e74
zip
tar.gz
Notes
Downloads

v7.3.0

Release: v7.3.0 [skip ci]

- Add: NEON & SDOT fallbacks for `i4` & `e3m2` (0c6afa5)

- Docs: M5 perf stats for Wasmtime v43 (43c2881)
- Fix: Alternative MSVC-friendly cast (4744b9b)
- Make: Disable LTCG due to MSVC issues (3d37684)
- Make: Try `PREBUILDS_ONLY=0` in CI (64c5f95)
- Improve: Lower NEONHALF → NEON requirements (37f99ec)
- Fix: Wire `nk_cast_neon` benchmarks (3793af2)
- Docs: Apple M5 native stats for secondary workloads (d7c81c4)
- Improve: Faster in-vector 4-way finalizers in NEON (968dcd1)
- Improve: Drop `nk_f16x4_to_f32x4_neon` (84bb20a)
- Improve: `vcvt_high` for faster unpacking (a5f4a19)
- Docs: Refresh GEMM/SYRK measurements Apple M4 → M5 (3e010de)
- Fix: Harden strided reductions in NEON & AVX2 (61ac67b)
- Fix: Double-counted tail in Skylake `f64` RMSD, Kabsch, and Umeyama (5391344)
- Improve: Share `decimal.Context.traps` rules (3c28ae9)
- Fix: Padding partial tail 32-bit words for `BMOPA` (2598487)
- Fix: Missing scale type definitions of mini-floats (91862da)
- Fix: Scalar buffer cast internal overwrites & aliasing (7b0e129)
- Fix: Top-bottom variable names (a014134)
- Improve: Giesen's E4M3 → F16 in Streaming SVE (25322b5)
- Improve: Fewer branches in SME GEMMs (858263c)
- Fix: Up-round dimensions count in sub-byte C++ tests (87a72d0)
- Make: Focus on M4 CPUs for SME probing (5ff63eb)
- Improve: PyTesting across more shapes (4bc3e44)
- Improve: Cleaner type-casting & promotion rules (23c2474)
- Make: Hide formatting commits for v7-7.2 (f6ce2da)
- Make: Native addon resolution for Deno & Bun (0d502d5)
- Docs: Citations (6220137)
- Improve: Faster mini-float norms in Streaming SVE (088de57)
- Make: Integrate PyRight (0fe56c0)
- Fix: F16 norms in SSVE skipped odd entries (bf3bfee)
- Fix: Harden SVE MaxSim upcasting logic (803eb33)
- Fix: Disable `FPCR.AH` bit (7b2b850)
- Make: Node 24 for trusted publishing (9f1a4ef)
- Fix: `_m` to zero-out predicated SVE/SME ops (16c157b)
- Fix: `_m` to zero-out predicated SVE lanes in `spatial/` (ac27cde)
- Make: Replace stale `prebuildify` (74c5454)

Apr 2, 2026
9d58663
zip
tar.gz
Notes
Downloads

v7.2.4

Release: v7.2.4 [skip ci]

- Make: 2h timeout budget for JS & Py builds (2e8f081)

Mar 28, 2026
facd43f
zip
tar.gz
Notes
Downloads

PreviousNext

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v7.6.0

v7.5.0

v7.4.5

v7.4.4

v7.4.3

v7.4.2

v7.4.1

v7.4.0

v7.3.0

v7.2.4

Tags: ashvardanian/NumKong