Skip to content

Comments

Make queue.closest_notvisited() safe and update call sites#787

Merged
arrayka merged 5 commits intomainfrom
u/arrayka/closest_notvisited_panic
Feb 20, 2026
Merged

Make queue.closest_notvisited() safe and update call sites#787
arrayka merged 5 commits intomainfrom
u/arrayka/closest_notvisited_panic

Conversation

@arrayka
Copy link
Contributor

@arrayka arrayka commented Feb 20, 2026

Why

  • Issue: closest_notvisited() assumes there is always at least one unvisited element. If that’s not true, it can perform an unsafe out‑of‑bounds access (when cursor == size). Today, callers must explicitly check has_notvisited_node() before calling closest_notvisited() to uphold this assumption. The risk exists because assert!(index <= self.size) is not strict enough.
  • Fix:
    • Tighten bounds check in set_visited() to avoid out‑of‑bounds access (when cursor == size).
    • Make closest_notvisited() return Option to avoid panics when no unvisited nodes remain.

What

  • queue.rs: closest_notvisited() now returns Option, guard against empty, fix set_visited bounds, update tests.
  • index.rs: Replace has_notvisited_node() loops with Option-based closest_notvisited() handling in search loops.
  • ground_truth.rs: Iterate using while let Some(...) for closest_notvisited().

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR makes neighbor-queue traversal safer by changing closest_notvisited() to return Option instead of panicking when no unvisited nodes remain, and updates search/ground-truth loops to consume that Option-based API.

Changes:

  • Change NeighborQueue::closest_notvisited() / NeighborPriorityQueue::closest_notvisited() to return Option<Neighbor<_>> and tighten set_visited() bounds.
  • Update multihop and main index search loops to use while ... && let Some(...) = ... instead of relying on has_notvisited_node() + non-optional returns.
  • Update ground-truth generation to iterate with while let Some(...).

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
diskann/src/neighbor/queue.rs Makes closest_notvisited() optional and adjusts visited bounds + some tests.
diskann/src/graph/search/multihop_search.rs Updates beam-frontier selection loop to stop on None.
diskann/src/graph/index.rs Updates core search beam-frontier selection loop to stop on None.
diskann-tools/src/utils/ground_truth.rs Updates ground-truth queue draining loop to stop on None.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

@hildebrandmw hildebrandmw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love everything about this. Thanks @arrayka! This is also a good check to make sure our CI checks that the experimental_diverse_search code does not break 😄

@codecov-commenter
Copy link

codecov-commenter commented Feb 20, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 89.00%. Comparing base (999d895) to head (2d08dbd).

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #787      +/-   ##
==========================================
- Coverage   89.00%   89.00%   -0.01%     
==========================================
  Files         431      431              
  Lines       78445    78455      +10     
==========================================
+ Hits        69821    69828       +7     
- Misses       8624     8627       +3     
Flag Coverage Δ
miri 89.00% <100.00%> (-0.01%) ⬇️
unittests 89.00% <100.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
diskann-disk/src/search/provider/disk_provider.rs 91.01% <ø> (ø)
diskann-tools/src/utils/ground_truth.rs 44.61% <100.00%> (-0.22%) ⬇️
diskann/src/graph/index.rs 95.91% <100.00%> (+<0.01%) ⬆️
diskann/src/graph/search/multihop_search.rs 98.05% <100.00%> (+0.01%) ⬆️
diskann/src/neighbor/queue.rs 98.28% <100.00%> (+0.02%) ⬆️

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Alex Razumov (from Dev Box) added 2 commits February 20, 2026 10:24
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@arrayka arrayka enabled auto-merge (squash) February 20, 2026 18:51
@arrayka arrayka merged commit 5c80ef0 into main Feb 20, 2026
26 checks passed
@arrayka arrayka deleted the u/arrayka/closest_notvisited_panic branch February 20, 2026 20:09
hildebrandmw added a commit that referenced this pull request Feb 23, 2026
# DiskANN v0.47.0

## Summary
* This version contains a major breaking change to the search interface
of `DiskANNIndex`. Please read the upgrade instructions below.
* An Aarch64 Neon has been added to `diskann-wide`.
* Various bug-fixes and code-quality improvements.

## Changes to Search
The search interface has been unified around a single `index.search()`
entry point using the `Search` trait.
The old per-search-type methods on `DiskANNIndex` (`search`,
`search_recorded`, `range_search`, `multihop_search`) have been removed
and replaced by typed parameter structs that carry their own search
logic.

### What Changed

| Removed | Replacement |

|------------------------------------------------------------|--------------------------------------------------------------|
| `SearchParams` struct | `diskann::graph::search::Knn` |
| `RangeSearchParams` struct | `diskann::graph::search::Range` |
| `SearchParamsError` | `diskann::graph::KnnSearchError` |
| `RangeSearchParamsError` | `diskann::graph::RangeSearchError` |
| `index.search(&strategy, &ctx, &query, &params, &mut out)` |
`index.search(knn, &strategy, &ctx, &query, &mut out)` |
| `index.search_recorded(..., &mut recorder)` |
`index.search(RecordedKnn::new(knn, &mut recorder), ...)` |
| `index.range_search(&strategy, &ctx, &query, &params)` |
`index.search(range, &strategy, &ctx, &query, &mut ())` |
| `index.multihop_search(..., &label_eval)` |
`index.search(MultihopSearch::new(knn, &label_eval), ...)` |
| `index.diverse_search(...)` | `index.search(Diverse::new(knn,
diverse_params), ...)` |

**`flat_search`** remains an inherent method on `DiskANNIndex`
Its `search_params` argument changed from `&SearchParams` to `&Knn`.

### Upgrade Instructions

#### 1. k-NN Search (`search`)

**Before:**

```rust
use diskann::graph::SearchParams;

let params = SearchParams::new(10, 100, None)?;
let stats = index.search(&strategy, &ctx, &query, &params, &mut output).await?;
```

**After:**

```rust
use diskann::graph::{Search, search::Knn};

let params = Knn::new(10, 100, None)?;
// Note: params is now the FIRST argument (moved before strategy)
let stats = index.search(params, &strategy, &ctx, &query, &mut output).await?;
```

Key differences:

- `SearchParams` -> `Knn` (import from `diskann::graph::search::Knn`)
- `SearchParamsError` -> `KnnSearchError` (import from
`diskann::graph::KnnSearchError`)
- Search params moved to the **first** argument of `index.search()`
- `k_value`, `l_value` fields are now private; use `.k_value()`,
`.l_value()` accessors (return `NonZeroUsize`)

#### 2. Recorded/Debug Search (`search_recorded`)

**Before:**

```rust
use diskann::graph::SearchParams;

let params = SearchParams::new(10, 100, None)?;
let stats = index
   .search_recorded(&strategy, &ctx, &query, &params, &mut output, &mut recorder)
   .await?;
```

**After:**

```rust
use diskann::graph::{Search, search::{Knn, RecordedKnn}};

let params = Knn::new(10, 100, None)?;
let recorded = RecordedKnn::new(params, &mut recorder);
let stats = index.search(recorded, &strategy, &ctx, &query, &mut output).await?;
```

#### 3. Range Search (`range_search`)

**Before:**

```rust
use diskann::graph::RangeSearchParams;

let params = RangeSearchParams::new(None, 100, None, 0.5, None, 1.0, 1.0)?;
let (stats, ids, distances) = index
   .range_search(&strategy, &ctx, &query, &params)
   .await?;
```

**After:**

```rust
use diskann::graph::{
    Search,
    search::Range,
    RangeSearchOutput,
};

// Simple form:
let params = Range::new(100, 0.5)?;
// Or full options form:
let params = Range::with_options(None, 100, None, 0.5, None, 1.0, 1.0)?;

// Note: output buffer is `&mut ()` — results come back in the return type
let result: RangeSearchOutput<_> = index
   .search(params, &strategy, &ctx, &query, &mut ())
   .await?;

// Access results:
let stats = result.stats;
let ids = result.ids;           // Vec<O>
let distances = result.distances; // Vec<f32>
```

Key differences:

- `RangeSearchParams` -> `Range` (import from
`diskann::graph::search::Range`)
- `RangeSearchParamsError` -> `RangeSearchError` (import from
`diskann::graph::RangeSearchError`)
- Return type changed from `(SearchStats, Vec<O>, Vec<f32>)` to
`RangeSearchOutput<O>` (a struct with `.stats`, `.ids`, `.distances`
fields)
- Pass `&mut ()` as the output buffer
- Field `starting_l_value` -> constructor arg `starting_l` (accessor:
`.starting_l()`)
- Field `initial_search_slack` -> constructor arg `initial_slack`
(accessor: `.initial_slack()`)
- Field `range_search_slack` -> constructor arg `range_slack` (accessor:
`.range_slack()`)

#### 4. Multihop / Label-Filtered Search (`multihop_search`)

**Before:**

```rust
use diskann::graph::SearchParams;

let params = SearchParams::new(10, 100, None)?;
let stats = index
   .multihop_search(&strategy, &ctx, &query, &params, &mut output, &label_eval)
   .await?;
```

**After:**

```rust
use diskann::graph::{Search, search::{Knn, MultihopSearch}};

let knn = Knn::new(10, 100, None)?;
let params = MultihopSearch::new(knn, &label_eval);
let stats = index.search(params, &strategy, &ctx, &query, &mut output).await?;
```

Key differences:

- `MultihopSearch` wraps a `Knn` -> label evaluator into a single params
object
- The label evaluator is part of the params, not a separate argument

#### 5. Flat Search (unchanged method, new param type)

**Before:**

```rust
use diskann::graph::SearchParams;

let params = SearchParams::new(10, 100, None)?;
index.flat_search(&strategy, &ctx, &query, &filter, &params, &mut output).await?;
```

**After:**

```rust
use diskann::graph::search::Knn;

let params = Knn::new(10, 100, None)?;
index.flat_search(&strategy, &ctx, &query, &filter, &params, &mut output).await?;
```

Only the parameter type changed (`SearchParams` -> `Knn`).

### Import Path Changes

| Old | New |

|------------------------------------------|--------------------------------------------------------|
| `diskann::graph::SearchParams` | `diskann::graph::search::Knn` |
| `diskann::graph::RangeSearchParams` | `diskann::graph::search::Range`
|
| `diskann::graph::SearchParamsError` | `diskann::graph::KnnSearchError`
|
| `diskann::graph::RangeSearchParamsError` |
`diskann::graph::RangeSearchError` |
| — | `diskann::graph::search::MultihopSearch` (new) |
| — | `diskann::graph::search::RecordedKnn` (new) |
| — | `diskann::graph::search::Diverse` (new, feature-gated) |
| — | `diskann::graph::Search` (trait, re-exported) |
| — | `diskann::graph::RangeSearchOutput` (re-exported) |

## Change List
* copy bftrees from the snapshot location to the save location by
@backurs in #783
* (RFC) Refactor search interface with unified SearchDispatch trait by
@narendatha in #773
* Make queue.closest_notvisited() safe and update call sites by @arrayka
in #787
* git ignore: Ignore local settings for claude code AI agent by @arrayka
in #789
* Enabling flag support in codecov by @arrayka in
#790
* Increase unit test coverage for diskann-tools crate by @Copilot in
#763
* Neon MVP by @hildebrandmw in
#777
* Adding GraphParams to be able to save graph parameters of index to
SavedParams by @backurs in #786

## New Contributors
* @narendatha made their first contribution in
#773

**Full Changelog**:
0.46.0...v0.47.0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants