Make queue.closest_notvisited() safe and update call sites#787
Conversation
There was a problem hiding this comment.
Pull request overview
This PR makes neighbor-queue traversal safer by changing closest_notvisited() to return Option instead of panicking when no unvisited nodes remain, and updates search/ground-truth loops to consume that Option-based API.
Changes:
- Change
NeighborQueue::closest_notvisited()/NeighborPriorityQueue::closest_notvisited()to returnOption<Neighbor<_>>and tightenset_visited()bounds. - Update multihop and main index search loops to use
while ... && let Some(...) = ...instead of relying onhas_notvisited_node()+ non-optional returns. - Update ground-truth generation to iterate with
while let Some(...).
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| diskann/src/neighbor/queue.rs | Makes closest_notvisited() optional and adjusts visited bounds + some tests. |
| diskann/src/graph/search/multihop_search.rs | Updates beam-frontier selection loop to stop on None. |
| diskann/src/graph/index.rs | Updates core search beam-frontier selection loop to stop on None. |
| diskann-tools/src/utils/ground_truth.rs | Updates ground-truth queue draining loop to stop on None. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
hildebrandmw
left a comment
There was a problem hiding this comment.
I love everything about this. Thanks @arrayka! This is also a good check to make sure our CI checks that the experimental_diverse_search code does not break 😄
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #787 +/- ##
==========================================
- Coverage 89.00% 89.00% -0.01%
==========================================
Files 431 431
Lines 78445 78455 +10
==========================================
+ Hits 69821 69828 +7
- Misses 8624 8627 +3
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
# DiskANN v0.47.0
## Summary
* This version contains a major breaking change to the search interface
of `DiskANNIndex`. Please read the upgrade instructions below.
* An Aarch64 Neon has been added to `diskann-wide`.
* Various bug-fixes and code-quality improvements.
## Changes to Search
The search interface has been unified around a single `index.search()`
entry point using the `Search` trait.
The old per-search-type methods on `DiskANNIndex` (`search`,
`search_recorded`, `range_search`, `multihop_search`) have been removed
and replaced by typed parameter structs that carry their own search
logic.
### What Changed
| Removed | Replacement |
|------------------------------------------------------------|--------------------------------------------------------------|
| `SearchParams` struct | `diskann::graph::search::Knn` |
| `RangeSearchParams` struct | `diskann::graph::search::Range` |
| `SearchParamsError` | `diskann::graph::KnnSearchError` |
| `RangeSearchParamsError` | `diskann::graph::RangeSearchError` |
| `index.search(&strategy, &ctx, &query, ¶ms, &mut out)` |
`index.search(knn, &strategy, &ctx, &query, &mut out)` |
| `index.search_recorded(..., &mut recorder)` |
`index.search(RecordedKnn::new(knn, &mut recorder), ...)` |
| `index.range_search(&strategy, &ctx, &query, ¶ms)` |
`index.search(range, &strategy, &ctx, &query, &mut ())` |
| `index.multihop_search(..., &label_eval)` |
`index.search(MultihopSearch::new(knn, &label_eval), ...)` |
| `index.diverse_search(...)` | `index.search(Diverse::new(knn,
diverse_params), ...)` |
**`flat_search`** remains an inherent method on `DiskANNIndex`
Its `search_params` argument changed from `&SearchParams` to `&Knn`.
### Upgrade Instructions
#### 1. k-NN Search (`search`)
**Before:**
```rust
use diskann::graph::SearchParams;
let params = SearchParams::new(10, 100, None)?;
let stats = index.search(&strategy, &ctx, &query, ¶ms, &mut output).await?;
```
**After:**
```rust
use diskann::graph::{Search, search::Knn};
let params = Knn::new(10, 100, None)?;
// Note: params is now the FIRST argument (moved before strategy)
let stats = index.search(params, &strategy, &ctx, &query, &mut output).await?;
```
Key differences:
- `SearchParams` -> `Knn` (import from `diskann::graph::search::Knn`)
- `SearchParamsError` -> `KnnSearchError` (import from
`diskann::graph::KnnSearchError`)
- Search params moved to the **first** argument of `index.search()`
- `k_value`, `l_value` fields are now private; use `.k_value()`,
`.l_value()` accessors (return `NonZeroUsize`)
#### 2. Recorded/Debug Search (`search_recorded`)
**Before:**
```rust
use diskann::graph::SearchParams;
let params = SearchParams::new(10, 100, None)?;
let stats = index
.search_recorded(&strategy, &ctx, &query, ¶ms, &mut output, &mut recorder)
.await?;
```
**After:**
```rust
use diskann::graph::{Search, search::{Knn, RecordedKnn}};
let params = Knn::new(10, 100, None)?;
let recorded = RecordedKnn::new(params, &mut recorder);
let stats = index.search(recorded, &strategy, &ctx, &query, &mut output).await?;
```
#### 3. Range Search (`range_search`)
**Before:**
```rust
use diskann::graph::RangeSearchParams;
let params = RangeSearchParams::new(None, 100, None, 0.5, None, 1.0, 1.0)?;
let (stats, ids, distances) = index
.range_search(&strategy, &ctx, &query, ¶ms)
.await?;
```
**After:**
```rust
use diskann::graph::{
Search,
search::Range,
RangeSearchOutput,
};
// Simple form:
let params = Range::new(100, 0.5)?;
// Or full options form:
let params = Range::with_options(None, 100, None, 0.5, None, 1.0, 1.0)?;
// Note: output buffer is `&mut ()` — results come back in the return type
let result: RangeSearchOutput<_> = index
.search(params, &strategy, &ctx, &query, &mut ())
.await?;
// Access results:
let stats = result.stats;
let ids = result.ids; // Vec<O>
let distances = result.distances; // Vec<f32>
```
Key differences:
- `RangeSearchParams` -> `Range` (import from
`diskann::graph::search::Range`)
- `RangeSearchParamsError` -> `RangeSearchError` (import from
`diskann::graph::RangeSearchError`)
- Return type changed from `(SearchStats, Vec<O>, Vec<f32>)` to
`RangeSearchOutput<O>` (a struct with `.stats`, `.ids`, `.distances`
fields)
- Pass `&mut ()` as the output buffer
- Field `starting_l_value` -> constructor arg `starting_l` (accessor:
`.starting_l()`)
- Field `initial_search_slack` -> constructor arg `initial_slack`
(accessor: `.initial_slack()`)
- Field `range_search_slack` -> constructor arg `range_slack` (accessor:
`.range_slack()`)
#### 4. Multihop / Label-Filtered Search (`multihop_search`)
**Before:**
```rust
use diskann::graph::SearchParams;
let params = SearchParams::new(10, 100, None)?;
let stats = index
.multihop_search(&strategy, &ctx, &query, ¶ms, &mut output, &label_eval)
.await?;
```
**After:**
```rust
use diskann::graph::{Search, search::{Knn, MultihopSearch}};
let knn = Knn::new(10, 100, None)?;
let params = MultihopSearch::new(knn, &label_eval);
let stats = index.search(params, &strategy, &ctx, &query, &mut output).await?;
```
Key differences:
- `MultihopSearch` wraps a `Knn` -> label evaluator into a single params
object
- The label evaluator is part of the params, not a separate argument
#### 5. Flat Search (unchanged method, new param type)
**Before:**
```rust
use diskann::graph::SearchParams;
let params = SearchParams::new(10, 100, None)?;
index.flat_search(&strategy, &ctx, &query, &filter, ¶ms, &mut output).await?;
```
**After:**
```rust
use diskann::graph::search::Knn;
let params = Knn::new(10, 100, None)?;
index.flat_search(&strategy, &ctx, &query, &filter, ¶ms, &mut output).await?;
```
Only the parameter type changed (`SearchParams` -> `Knn`).
### Import Path Changes
| Old | New |
|------------------------------------------|--------------------------------------------------------|
| `diskann::graph::SearchParams` | `diskann::graph::search::Knn` |
| `diskann::graph::RangeSearchParams` | `diskann::graph::search::Range`
|
| `diskann::graph::SearchParamsError` | `diskann::graph::KnnSearchError`
|
| `diskann::graph::RangeSearchParamsError` |
`diskann::graph::RangeSearchError` |
| — | `diskann::graph::search::MultihopSearch` (new) |
| — | `diskann::graph::search::RecordedKnn` (new) |
| — | `diskann::graph::search::Diverse` (new, feature-gated) |
| — | `diskann::graph::Search` (trait, re-exported) |
| — | `diskann::graph::RangeSearchOutput` (re-exported) |
## Change List
* copy bftrees from the snapshot location to the save location by
@backurs in #783
* (RFC) Refactor search interface with unified SearchDispatch trait by
@narendatha in #773
* Make queue.closest_notvisited() safe and update call sites by @arrayka
in #787
* git ignore: Ignore local settings for claude code AI agent by @arrayka
in #789
* Enabling flag support in codecov by @arrayka in
#790
* Increase unit test coverage for diskann-tools crate by @Copilot in
#763
* Neon MVP by @hildebrandmw in
#777
* Adding GraphParams to be able to save graph parameters of index to
SavedParams by @backurs in #786
## New Contributors
* @narendatha made their first contribution in
#773
**Full Changelog**:
0.46.0...v0.47.0
Why
closest_notvisited()assumes there is always at least one unvisited element. If that’s not true, it can perform an unsafe out‑of‑bounds access (whencursor == size). Today, callers must explicitly checkhas_notvisited_node()before callingclosest_notvisited()to uphold this assumption. The risk exists becauseassert!(index <= self.size)is not strict enough.set_visited()to avoid out‑of‑bounds access (whencursor == size).closest_notvisited()returnOptionto avoid panics when no unvisited nodes remain.What
closest_notvisited()now returnsOption, guard against empty, fixset_visitedbounds, update tests.has_notvisited_node()loops withOption-basedclosest_notvisited()handling in search loops.while let Some(...)forclosest_notvisited().