Tags: ashvardanian/NumKong
Tags
Release: v7.6.0 [skip ci] - Add: DLPack 1.3 interop bridge for numkong.Tensor (ea74fe1) - Add: Back-port tensor API to C++20 for CUDA (ad93068) - Improve: FP8 GEMM throughput on Skylake/Haswell + Granite Rapids E5M2 kernel (c19bec9) - Improve: FP8 pairwise distance kernels via Giesen trick + F16 widen path (679f55f) - Fix: Keep `*_serial` kernels scalar across LTO (455d535) - Make: Enable symbol exports for `nk_shared` Emscripten builds (482e4fd) - Improve: SSD trace-identity fold across all mesh backends + Genoa/NEONFHM kernels (e9d40e5) - Make: Normalize base PowerPC & LoongArch cap for JS (ab81191)
Release: v7.5.0 [skip ci] - Add: NEON popcount kernel for nk_reduce_moments_u1 (2181e0c) - Add: Tensor constructors, sealed trait family, div_ceil cleanup (2792279) - Add: Span-based matrix `_into` APIs, parallel Hammings/Jaccards, full-crate docs (99289df) - Add: OpenMP for Python & JavaScript (499ecc9) - Add: Granite Rapids AMX for F16 & F32 (28036ea) - Fix: Native ISA probe on Apple Clang + compile/runtime glyph (bc13e02) - Make: Detect illegal instructions in macOS CI (289cdaf) - Fix: Drop `-march=` on macOS setup.py builds (28aac74) - Fix: Exclude `std::signal` from WASM builds (14814c5) - Improve: Drop GNU statement-expression macros in SVE reduce helpers (b8b4ca0) - Make: Drop `+nosimd` from AArch64 baseline (23f5195) - Make: Forbid auto-vectorization in portable baseline builds (43e8324) - Make: Pin TU baseline to per-arch ABI floor across build systems (453ed5f) - Fix: Mitigate GCC 13 wrong BF16 splat in Arm NEON (#346) (fc3d8ec) - Improve: Log faulting capability detection (a401f8a) - Improve: Log faulting kernel on fatal signals in `nk_test` (22c7c79) - Make: Normalize Python test dependencies across CI and docs (8a0f3d4) - Make: Baseline-only ISA for shared-library test, harden Windows CI (1907685) - Fix: Wrong compiler probes for SMEBF16 & SMEBI32 (8b19ddb) - Make: Log host CPU capabilities in macOS and Windows CI jobs (988eeb2) - Fix: Pre-declare OpenMP loop counter, universal libomp for macOS (493a021) - Fix: Use int for OpenMP loop counters, absolute libomp install name (ccc0118) - Fix: GCC requires +sme prefix in target attribute for __arm_sc_* stubs (291dc0a) - Fix: Signed OpenMP iterators, source-built libomp, JS KMP guard (dc1ae75) - Fix: OpenMP wheel builds on macOS and Windows (f569121) - Fix: Add target("sme") to __arm_sc_* stubs for GCC compatibility (ad2add0) - Fix: Unpoison SVE scalar reductions for MemorySanitizer (#342) (b42eda7) - Improve: Move SME runtime stubs to types.h as weak inline definitions (64ca934) - Improve: Manual SME streaming control, single enter/exit per API call (6432837) - Fix: Update `cdist` edge-case test for re-added `threads=` kwarg (50681af) - Make: Allow force-enabling ISA targets via environment variables (0e58702) - Improve: Abandon F32βF64 via Ozaki on Granite Rapids (94a5f19) - Make: FreeBSD, PPC64le, LoongArch, RISC-V releases & compress Windows (a9a0d83) - Make: Standardize CI compilers and add Windows test job (9a22ea4) - Make: Shrink serial fallbacks with scoped size optimization (83154a8) - Make: Compress Windows builds (e30ad3d) - Fix: Streaming-compatible stubs for LLVM SME builds (0be7b2f)
Release: v7.4.4 [skip ci] - Fix: ARMv7 Rust cross-compilation with CC for versioned GCC (a5e67e6) - Make: `check_source_runs`-probing like `march=native` on MSVC (7a152f3) - Fix: Drop `_MM_FROUND_NO_EXC` from `_mm256_cvtps_ph` calls (8649b0c) - Fix: Guard against old MSVC preprocessor (25d3304) - Make: Enforce newer preprocessor in MSVC (be966af) - Make: Cleaner CIBW artifact names & env forwarding (a6cf642) - Make: Forward cross-compilation flags for macOS wheels (6ed3b8c) - Make: Split ppc64le, s390x, i686 CIBW runs (c01795c)
Release: v7.4.3 [skip ci] - Fix: Require AArch64 for NEON kernels (2ba1b34) - Docs: Table order & formatting (8673a56) - Make: Avoid `--all-features` in Rust cross-compilation CI (8be8bff) - Improve: Arm32 compatibility (6404172) - Make: `cancel-in-progress` CI to shift compute resources (dfc8fa0) - Improve: Harden Swift SDK for 6.1+ toolkit (965cd52) - Make: Strip `.unsafeFlags` & list platforms for SPM consumption (b061b78) - Make: Expose `CNumKongDispatch` target to Swift users (6aa00a8)
Release: v7.4.0 [skip ci] - Add: Was elementwise ops & spatial mini-float kernels (81b8c44) - Add: WASM type-casting kernels (e09df31) - Add: SVE+SDOT ops for 8-bit integers (913fc6b) - Fix: Misplaced NEON loads/stores in Sierra (05e3045) - Fix: Avoid unconsitional `np` symbols (9dffb68) - Make: Resolve probe locations for NPM consumers (c602f45) - Docs: Refined "What's Inside" (28f35cd) - Docs: Mini-float kernel selection strategy (04e6598) - Improve: Accelerate PyTests, reduce `Decimal` use (2417248) - Make: Move `.pyi` for PyLance (688ec2d) - Fix: Inconsistent SME function qualifiers (5b4148a) - Improve: Smaller test inputs under QEMU (ee36bf2) - Improve: Vectorize GEMM "packers" (86127a4) - Make: Longer timeouts for QEMU in CI (a9cc732) - Fix: `vec_t` store helper args order (eecbcac) - Fix: Negative stride tensor reductions (3ea81be) - Improve: Recursive stride collapsing and axis-lane fast paths for N-D reductions (cf8eaf6) - Improve: Faster reductions in strided tensors (61651ed) - Improve: Wider NEON curved, mesh, & probability F16 kernels (1c17678) - Fix: Harden mini-float type-casting (1911b89) - Make: Ship `win32-arm64` NPM builds (578b7ad) - Make: Auto-bump JS platform-specific versions (5617f75) - Fix: `vcombine` instead of initializer lists for NEON arrays in MSVC (906c178) - Fix: Avoid flaky `vld1_f16` for MSVC (7a987d2)
Release: v7.3.0 [skip ci] - Add: NEON & SDOT fallbacks for `i4` & `e3m2` (0c6afa5) - Docs: M5 perf stats for Wasmtime v43 (43c2881) - Fix: Alternative MSVC-friendly cast (4744b9b) - Make: Disable LTCG due to MSVC issues (3d37684) - Make: Try `PREBUILDS_ONLY=0` in CI (64c5f95) - Improve: Lower NEONHALF β NEON requirements (37f99ec) - Fix: Wire `nk_cast_neon` benchmarks (3793af2) - Docs: Apple M5 native stats for secondary workloads (d7c81c4) - Improve: Faster in-vector 4-way finalizers in NEON (968dcd1) - Improve: Drop `nk_f16x4_to_f32x4_neon` (84bb20a) - Improve: `vcvt_high` for faster unpacking (a5f4a19) - Docs: Refresh GEMM/SYRK measurements Apple M4 β M5 (3e010de) - Fix: Harden strided reductions in NEON & AVX2 (61ac67b) - Fix: Double-counted tail in Skylake `f64` RMSD, Kabsch, and Umeyama (5391344) - Improve: Share `decimal.Context.traps` rules (3c28ae9) - Fix: Padding partial tail 32-bit words for `BMOPA` (2598487) - Fix: Missing scale type definitions of mini-floats (91862da) - Fix: Scalar buffer cast internal overwrites & aliasing (7b0e129) - Fix: Top-bottom variable names (a014134) - Improve: Giesen's E4M3 β F16 in Streaming SVE (25322b5) - Improve: Fewer branches in SME GEMMs (858263c) - Fix: Up-round dimensions count in sub-byte C++ tests (87a72d0) - Make: Focus on M4 CPUs for SME probing (5ff63eb) - Improve: PyTesting across more shapes (4bc3e44) - Improve: Cleaner type-casting & promotion rules (23c2474) - Make: Hide formatting commits for v7-7.2 (f6ce2da) - Make: Native addon resolution for Deno & Bun (0d502d5) - Docs: Citations (6220137) - Improve: Faster mini-float norms in Streaming SVE (088de57) - Make: Integrate PyRight (0fe56c0) - Fix: F16 norms in SSVE skipped odd entries (bf3bfee) - Fix: Harden SVE MaxSim upcasting logic (803eb33) - Fix: Disable `FPCR.AH` bit (7b2b850) - Make: Node 24 for trusted publishing (9f1a4ef) - Fix: `_m` to zero-out predicated SVE/SME ops (16c157b) - Fix: `_m` to zero-out predicated SVE lanes in `spatial/` (ac27cde) - Make: Replace stale `prebuildify` (74c5454)
PreviousNext