Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #777 +/- ##
==========================================
- Coverage 89.00% 88.98% -0.02%
==========================================
Files 428 428
Lines 78417 78563 +146
==========================================
+ Hits 69795 69913 +118
- Misses 8622 8650 +28
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
I can see that coverage on non-x86-64 architectures is going to be fun to deal with ... |
There was a problem hiding this comment.
Pull request overview
This PR adds an AArch64 Neon SIMD backend to diskann-wide and wires it into higher-level crates (diskann-vector, diskann-quantization, diskann-benchmark-simd) so Arm64 builds can use the same wide-SIMD abstractions and dispatch patterns as existing x86_64 backends.
Changes:
- Add
diskann-wide::arch::aarch64withNeonarchitecture token, masks, load/store (incl. optimized partial loads), and Neon implementations for core SIMD register types + doubled types. - Integrate Neon into distance dispatch/specialization (
diskann-vector), conversions (diskann-vector), and quantization distance dispatch/retargeting (diskann-quantization). - Register Neon kernels and improve architecture dispatch diagnostics in
diskann-benchmark-simd, plus enable Arm64 CI and default+neon,+dotprodrustflags for AArch64.
Reviewed changes
Copilot reviewed 43 out of 43 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| diskann-wide/tests/dispatch.rs | Extends dispatch test coverage to AArch64 Neon and refactors inner product loop into a shared helper. |
| diskann-wide/src/test_utils/ops.rs | Removes x86-only gating so SplitJoin test helpers/macros can be reused by Neon tests. |
| diskann-wide/src/test_utils/dot_product.rs | Adds additional expected-dot implementations + expands test coverage for new dot-product combinations. |
| diskann-wide/src/lib.rs | Broadens test-arch env var support to include AArch64 and adjusts internal module gating. |
| diskann-wide/src/helpers.rs | Extends conversion macro support and tightens cfg gating for x86-only shift helpers. |
| diskann-wide/src/emulated.rs | Adds missing emulated dot-product impls and corresponding tests. |
| diskann-wide/src/doubled.rs | Adds load_simd_first/store_simd_first for doubled vectors and inlines some doubled-mask ops. |
| diskann-wide/src/arch/mod.rs | Adds AArch64 aarch64 module and dispatch plumbing; adjusts x86 module cfg gating. |
| diskann-wide/src/arch/emulated/mod.rs | Adds a Level sanity check for Scalar in tests. |
| diskann-wide/src/arch/aarch64/mod.rs | Defines Neon architecture token, dispatch helpers, Current selection, and test gating (test_neon). |
| diskann-wide/src/arch/aarch64/macros.rs | Macro infrastructure for defining Neon SIMDVector types, bitops, comparisons, splat, and split/join. |
| diskann-wide/src/arch/aarch64/masks.rs | Implements Neon mask representations + move_mask/from_mask/keep_first for multiple lane widths. |
| diskann-wide/src/arch/aarch64/algorithms/mod.rs | AArch64 algorithm module root (partial loads). |
| diskann-wide/src/arch/aarch64/algorithms/load_first.rs | Optimized Neon partial-load primitives used by load_simd_first. |
| diskann-wide/src/arch/aarch64/u8x8_.rs | Neon u8x8 implementation + tests. |
| diskann-wide/src/arch/aarch64/u8x16_.rs | Neon u8x16 implementation + tests + split/join. |
| diskann-wide/src/arch/aarch64/u16x8_.rs | Neon u16x8 implementation + tests. |
| diskann-wide/src/arch/aarch64/u32x4_.rs | Neon u32x4 implementation + udot dot-product + reductions/select + tests. |
| diskann-wide/src/arch/aarch64/u64x2_.rs | Neon u64x2 implementation with emulated ops where intrinsics are missing + tests. |
| diskann-wide/src/arch/aarch64/i8x8_.rs | Neon i8x8 implementation + tests. |
| diskann-wide/src/arch/aarch64/i8x16_.rs | Neon i8x16 implementation + tests + split/join. |
| diskann-wide/src/arch/aarch64/i16x8_.rs | Neon i16x8 implementation + widening conversions + tests. |
| diskann-wide/src/arch/aarch64/i32x4_.rs | Neon i32x4 implementation + dot products (incl. sdot) + conversions + tests. |
| diskann-wide/src/arch/aarch64/i64x2_.rs | Neon i64x2 implementation with emulated ops + tests. |
| diskann-wide/src/arch/aarch64/f32x2_.rs | Neon f32x2 implementation + tests. |
| diskann-wide/src/arch/aarch64/f32x4_.rs | Neon f32x4 implementation incl. f16<->f32 via asm + reductions/select/minmax + tests. |
| diskann-wide/src/arch/aarch64/f16x4_.rs | Neon f16x4 representation + load/store + tests. |
| diskann-wide/src/arch/aarch64/f16x8_.rs | Neon f16x8 representation + load/store + split/join + tests. |
| diskann-wide/src/arch/aarch64/double.rs | Defines doubled and double-doubled Neon vector types and conversions + tests. |
| diskann-wide/compile-aarch64-on-x86.sh | Helper script to cross-compile tests for aarch64 with required target features. |
| diskann-vector/src/distance/implementations.rs | Makes fixed-dimension specialization available beyond x86_64 (for Neon too). |
| diskann-vector/src/distance/distance_provider.rs | Adds Neon specialization lists and makes specialization machinery available on AArch64. |
| diskann-vector/src/conversion.rs | Adds Neon SIMD slice conversion paths for f16<->f32 and broadens SIMD convert helpers. |
| diskann-quantization/src/spherical/iface.rs | Adds Neon dispatch mapping for spherical quantization compute paths. |
| diskann-quantization/src/bits/distances.rs | Adds Neon retargeting + expands tests to exercise Neon paths where available. |
| diskann-quantization/src/algorithms/hadamard.rs | Adds Neon implementation that retargets to scalar, plus Neon test inclusion. |
| diskann-providers/src/model/pq/distance/dynamic.rs | Adjusts PQ distance test tolerance for floating-point association differences. |
| diskann-benchmark-simd/src/lib.rs | Registers Neon kernels, refactors dispatch rules into match_arch!, improves mismatch diagnostics/scoring. |
| diskann-benchmark-simd/src/bin.rs | Selects architecture-specific integration test input (x86_64 vs aarch64). |
| .github/workflows/ci.yml | Adds Arm64 runner (ubuntu-24.04-arm) to CI matrices. |
| .cargo/config.toml | Enables -C target-feature=+neon,+dotprod by default on AArch64 targets. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
I know you're simulating the aarch64 intrinsics for testing here but any chance we can post some basic benchmark numbers here? |
Here's some benchmark numbers: The bit that annoys me the most is that we are actually slightly slower for |
|
The perf loss for L2 was bugging me - so I added support for basically what LLVM is doing in it's auto vectorization. Here are some new numbers. |
# DiskANN v0.47.0
## Summary
* This version contains a major breaking change to the search interface
of `DiskANNIndex`. Please read the upgrade instructions below.
* An Aarch64 Neon has been added to `diskann-wide`.
* Various bug-fixes and code-quality improvements.
## Changes to Search
The search interface has been unified around a single `index.search()`
entry point using the `Search` trait.
The old per-search-type methods on `DiskANNIndex` (`search`,
`search_recorded`, `range_search`, `multihop_search`) have been removed
and replaced by typed parameter structs that carry their own search
logic.
### What Changed
| Removed | Replacement |
|------------------------------------------------------------|--------------------------------------------------------------|
| `SearchParams` struct | `diskann::graph::search::Knn` |
| `RangeSearchParams` struct | `diskann::graph::search::Range` |
| `SearchParamsError` | `diskann::graph::KnnSearchError` |
| `RangeSearchParamsError` | `diskann::graph::RangeSearchError` |
| `index.search(&strategy, &ctx, &query, ¶ms, &mut out)` |
`index.search(knn, &strategy, &ctx, &query, &mut out)` |
| `index.search_recorded(..., &mut recorder)` |
`index.search(RecordedKnn::new(knn, &mut recorder), ...)` |
| `index.range_search(&strategy, &ctx, &query, ¶ms)` |
`index.search(range, &strategy, &ctx, &query, &mut ())` |
| `index.multihop_search(..., &label_eval)` |
`index.search(MultihopSearch::new(knn, &label_eval), ...)` |
| `index.diverse_search(...)` | `index.search(Diverse::new(knn,
diverse_params), ...)` |
**`flat_search`** remains an inherent method on `DiskANNIndex`
Its `search_params` argument changed from `&SearchParams` to `&Knn`.
### Upgrade Instructions
#### 1. k-NN Search (`search`)
**Before:**
```rust
use diskann::graph::SearchParams;
let params = SearchParams::new(10, 100, None)?;
let stats = index.search(&strategy, &ctx, &query, ¶ms, &mut output).await?;
```
**After:**
```rust
use diskann::graph::{Search, search::Knn};
let params = Knn::new(10, 100, None)?;
// Note: params is now the FIRST argument (moved before strategy)
let stats = index.search(params, &strategy, &ctx, &query, &mut output).await?;
```
Key differences:
- `SearchParams` -> `Knn` (import from `diskann::graph::search::Knn`)
- `SearchParamsError` -> `KnnSearchError` (import from
`diskann::graph::KnnSearchError`)
- Search params moved to the **first** argument of `index.search()`
- `k_value`, `l_value` fields are now private; use `.k_value()`,
`.l_value()` accessors (return `NonZeroUsize`)
#### 2. Recorded/Debug Search (`search_recorded`)
**Before:**
```rust
use diskann::graph::SearchParams;
let params = SearchParams::new(10, 100, None)?;
let stats = index
.search_recorded(&strategy, &ctx, &query, ¶ms, &mut output, &mut recorder)
.await?;
```
**After:**
```rust
use diskann::graph::{Search, search::{Knn, RecordedKnn}};
let params = Knn::new(10, 100, None)?;
let recorded = RecordedKnn::new(params, &mut recorder);
let stats = index.search(recorded, &strategy, &ctx, &query, &mut output).await?;
```
#### 3. Range Search (`range_search`)
**Before:**
```rust
use diskann::graph::RangeSearchParams;
let params = RangeSearchParams::new(None, 100, None, 0.5, None, 1.0, 1.0)?;
let (stats, ids, distances) = index
.range_search(&strategy, &ctx, &query, ¶ms)
.await?;
```
**After:**
```rust
use diskann::graph::{
Search,
search::Range,
RangeSearchOutput,
};
// Simple form:
let params = Range::new(100, 0.5)?;
// Or full options form:
let params = Range::with_options(None, 100, None, 0.5, None, 1.0, 1.0)?;
// Note: output buffer is `&mut ()` — results come back in the return type
let result: RangeSearchOutput<_> = index
.search(params, &strategy, &ctx, &query, &mut ())
.await?;
// Access results:
let stats = result.stats;
let ids = result.ids; // Vec<O>
let distances = result.distances; // Vec<f32>
```
Key differences:
- `RangeSearchParams` -> `Range` (import from
`diskann::graph::search::Range`)
- `RangeSearchParamsError` -> `RangeSearchError` (import from
`diskann::graph::RangeSearchError`)
- Return type changed from `(SearchStats, Vec<O>, Vec<f32>)` to
`RangeSearchOutput<O>` (a struct with `.stats`, `.ids`, `.distances`
fields)
- Pass `&mut ()` as the output buffer
- Field `starting_l_value` -> constructor arg `starting_l` (accessor:
`.starting_l()`)
- Field `initial_search_slack` -> constructor arg `initial_slack`
(accessor: `.initial_slack()`)
- Field `range_search_slack` -> constructor arg `range_slack` (accessor:
`.range_slack()`)
#### 4. Multihop / Label-Filtered Search (`multihop_search`)
**Before:**
```rust
use diskann::graph::SearchParams;
let params = SearchParams::new(10, 100, None)?;
let stats = index
.multihop_search(&strategy, &ctx, &query, ¶ms, &mut output, &label_eval)
.await?;
```
**After:**
```rust
use diskann::graph::{Search, search::{Knn, MultihopSearch}};
let knn = Knn::new(10, 100, None)?;
let params = MultihopSearch::new(knn, &label_eval);
let stats = index.search(params, &strategy, &ctx, &query, &mut output).await?;
```
Key differences:
- `MultihopSearch` wraps a `Knn` -> label evaluator into a single params
object
- The label evaluator is part of the params, not a separate argument
#### 5. Flat Search (unchanged method, new param type)
**Before:**
```rust
use diskann::graph::SearchParams;
let params = SearchParams::new(10, 100, None)?;
index.flat_search(&strategy, &ctx, &query, &filter, ¶ms, &mut output).await?;
```
**After:**
```rust
use diskann::graph::search::Knn;
let params = Knn::new(10, 100, None)?;
index.flat_search(&strategy, &ctx, &query, &filter, ¶ms, &mut output).await?;
```
Only the parameter type changed (`SearchParams` -> `Knn`).
### Import Path Changes
| Old | New |
|------------------------------------------|--------------------------------------------------------|
| `diskann::graph::SearchParams` | `diskann::graph::search::Knn` |
| `diskann::graph::RangeSearchParams` | `diskann::graph::search::Range`
|
| `diskann::graph::SearchParamsError` | `diskann::graph::KnnSearchError`
|
| `diskann::graph::RangeSearchParamsError` |
`diskann::graph::RangeSearchError` |
| — | `diskann::graph::search::MultihopSearch` (new) |
| — | `diskann::graph::search::RecordedKnn` (new) |
| — | `diskann::graph::search::Diverse` (new, feature-gated) |
| — | `diskann::graph::Search` (trait, re-exported) |
| — | `diskann::graph::RangeSearchOutput` (re-exported) |
## Change List
* copy bftrees from the snapshot location to the save location by
@backurs in #783
* (RFC) Refactor search interface with unified SearchDispatch trait by
@narendatha in #773
* Make queue.closest_notvisited() safe and update call sites by @arrayka
in #787
* git ignore: Ignore local settings for claude code AI agent by @arrayka
in #789
* Enabling flag support in codecov by @arrayka in
#790
* Increase unit test coverage for diskann-tools crate by @Copilot in
#763
* Neon MVP by @hildebrandmw in
#777
* Adding GraphParams to be able to save graph parameters of index to
SavedParams by @backurs in #786
## New Contributors
* @narendatha made their first contribution in
#773
**Full Changelog**:
0.46.0...v0.47.0
Adds a (mostly) complete AArch64 Neon backend to
diskann-wideand wires it throughdiskann-vector,diskann-quantization, anddiskann-benchmark-simd.This PR has existed in a largely completed state for quite a while now - but as usual the last 10% takes a considerable amount of work. So here it is.
diskann-wide— Neon backendNeon implementations for all SIMD types matching the existing x86_64 (V3/V4) backends:
u8x8,i8x8,f32x2,u8x16,i8x16,u16x8,i16x8,u32x4,i32x4,f32x4,u64x2,i64x2,f16x4,f16x8.f32x8,f32x16,u8x32,i8x32,i32x8, etc.) via the existingDoubledmachinery.move_mask,from_mask, and optimizedkeep_firstfor all 8 mask widths.Add,Sub,Mul, FMA,Abs,MinMax.SIMDPartialEqandSIMDPartialOrd.Not,And,Or,Xor,Shr,Shl(with Miri fallbacks for variable shifts).i16×i16→i32,u8×i8→i32,i8×u8→i32usingvdotq_s32(requires+dotprod).sum_treevia pairwise addition (vpaddq).f16↔f32(lossless and cast),u8→i16,i8→i16,i32→f32, split/join for all appropriate types.Optimized
load_simd_first(algorithms/load_first.rs):Rather than falling back to scalar
Emulatedelement-by-element loads, partial loads use Neon-native primitives:vld1_u8loads combined withvqtbl1q_u8(TBL shuffle). Includes a Miri shim since Miri does not supportvqtbl1q_u8.vld1_lane/vcombine.The
aarch64_define_loadstore!macro accepts a$load_firstfunction, andf16x4/f16x8delegate to theu16x4/u16x8primitives respectively.Doubledtypes implementload_simd_first/store_simd_firstbranchlessly by passing the full count to the first half andfirst.saturating_sub(HALF)to the second.Test infrastructure:
test_neon()helper withWIDE_TEST_MIN_ARCHenv-var support, matching the x86_64test_arch_number()pattern. Supports"all"/"neon"(panics if unavailable) and"scalar"(skips).if let Some(arch) = test_neon() { ... }— graceful skip when Neon is unavailable, hard failure when explicitly requested.diskann-vector— Neon distance kernels14
SIMDSchemaimplementations covering:f32,f16,u8,i8.f32andf16.diskann-quantizationretarget()).retarget().diskann-benchmark-simdDispatchRuleimpls into amatch_arch!macro.test-aarch64.jsonand architecture-aware integration test selection.Other changes
.cargo/config.toml: Enables+neon,+dotprodforaarch64targets..github/workflows/ci.yml: Addedaarch64-unknown-linux-gnuto cross-compilation targets.diskann-providers: Relaxed a PQ distance test tolerance (6e-7→6.3e-7) for the different floating opint association used by theNeonimplementations.Design decisions
Neonbackend uses a compile-time token rather than runtime feature detection. Neon is mandatory on AArch64.Runtime dispatch can be added later if needed.
+dotprodrequired. Needed forvdotqin dot-product kernels. This excludes pre-2018 cores but shoud covers mainstream server and desktop targets (Graviton 2+, Apple M1+, Ampere Altra). ARMv8.4+ mandates it.diskann-vector. The SIMD epilogues could useload_simd_firstfor a potential win on i8/u8 cosine where the masked load cost is amortized across multiple operations, but real Arm64 benchmarking is needed first.Suggested reviewing order
diskann-wide/src/arch/aarch64/mod.rs— Architecture definition,Neontoken, dispatch,test_neon().diskann-wide/src/arch/aarch64/macros.rs— The macro infrastructure that all type files build on.diskann-wide/src/arch/aarch64/masks.rs— Mask representations and operations (move_mask,from_mask,keep_first).diskann-wide/src/arch/aarch64/algorithms/load_first.rs— Optimized partial load primitives. Read bottom-up: impl functions first, then wrappers.f32x4_.rsfor 128-bit float, ori32x4_.rsfor dot products) — the rest are structurally identical.diskann-wide/src/arch/aarch64/double.rsanddiskann-wide/src/doubled.rs— Doubled types and branchless partial load/store.diskann-vector/src/distance/simd.rs— Neon distance kernels.diskann-benchmark-simd/src/lib.rs—match_arch!refactor and Neon registration.diskann-quantization/— Neon test paths (mechanical).