Increase unit test coverage for diskann-tools crate#763
Conversation
…helper, tracing, gen_associated_data_from_range, data_type, and graph_data_types Co-authored-by: arrayka <1551741+arrayka@users.noreply.github.com>
Co-authored-by: arrayka <1551741+arrayka@users.noreply.github.com>
Co-authored-by: arrayka <1551741+arrayka@users.noreply.github.com>
Co-authored-by: arrayka <1551741+arrayka@users.noreply.github.com>
Co-authored-by: arrayka <1551741+arrayka@users.noreply.github.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #763 +/- ##
==========================================
+ Coverage 89.00% 89.16% +0.15%
==========================================
Files 428 431 +3
Lines 78417 78739 +322
==========================================
+ Hits 69795 70207 +412
+ Misses 8622 8532 -90
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
|
@arrayka Unfortunately I hit an unexpected error while processing your comment. I've automatically reported this to GitHub. You can ask me to try again later by mentioning me in a new comment. If you want to contact GitHub about this error, please mention the following identifier so they can better serve you: Sorry for the inconvenience! |
…atibility Co-authored-by: arrayka <1551741+arrayka@users.noreply.github.com>
…unique file paths Co-authored-by: arrayka <1551741+arrayka@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR increases unit test coverage across the diskann-tools crate by adding targeted tests for utility modules, while also making a couple of small implementation adjustments to improve testability and input validation behavior.
Changes:
- Added new unit tests for multiple
diskann-tools/src/utils/*modules (errors, tracing, params, bitmap utilities, recall helpers, data generation). - Made
gen_associated_data_from_rangegeneric overStorageWriteProviderto enable in-memory provider usage in tests. - Fixed
int8/uint8radius validation logic inrandom_data_generator(previous condition was unreachable).
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| diskann-tools/src/utils/tracing.rs | Adds unit tests for init_test_subscriber and guard scoping. |
| diskann-tools/src/utils/search_index_utils.rs | Adds tests for RecallBoundsError display/conversion and KRecallAtN getters. |
| diskann-tools/src/utils/random_data_generator.rs | Fixes radius validation logic and adds broader unit tests (including Fp16). |
| diskann-tools/src/utils/parameter_helper.rs | Adds tests for get_num_threads(Some/None). |
| diskann-tools/src/utils/generate_synthetic_labels_utils.rs | Switches label output to use a storage provider and adds tests using VirtualStorageProvider::new_memory(). |
| diskann-tools/src/utils/gen_associated_data_from_range.rs | Generalizes storage provider type and adds range generation tests. |
| diskann-tools/src/utils/filter_search_utils.rs | Adds tests for SerializableBitSet conversion and bitmap matching edge cases. |
| diskann-tools/src/utils/cmd_tool_error.rs | Adds tests for CMDToolError formatting and From conversions. |
Comments suppressed due to low confidence (2)
diskann-tools/src/utils/random_data_generator.rs:40
- The radius validation error message says the radius cannot be "greater than 127 and less than or equal to 0", but the condition is now
(radius > 127.0 || radius <= 0.0). This message is logically incorrect/misleading for users (e.g., radius=128 triggers it). Consider updating the message (or splitting into two messages) to match the actual validation.
if (data_type == DataType::Int8 || data_type == DataType::Uint8)
&& (radius > 127.0 || radius <= 0.0)
{
return Err(CMDToolError {
details:
"Error: for int8/uint8 datatypes, radius (L2 norm) cannot be greater than 127 and less than or equal to 0"
.to_string(),
});
diskann-tools/src/utils/random_data_generator.rs:75
- Same issue as above in
write_random_data_writer: the validation condition is(radius > 127.0 || radius <= 0.0)but the error text uses "greater than 127 and less than or equal to 0". Updating the message will also require adjusting the unit tests that assert on this string.
if (data_type == DataType::Int8 || data_type == DataType::Uint8)
&& (radius > 127.0 || radius <= 0.0)
{
return Err(CMDToolError {
details:
"Error: for int8/uint8 datatypes, radius (L2 norm) cannot be greater than 127 and less than or equal to 0"
.to_string(),
});
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
# DiskANN v0.47.0
## Summary
* This version contains a major breaking change to the search interface
of `DiskANNIndex`. Please read the upgrade instructions below.
* An Aarch64 Neon has been added to `diskann-wide`.
* Various bug-fixes and code-quality improvements.
## Changes to Search
The search interface has been unified around a single `index.search()`
entry point using the `Search` trait.
The old per-search-type methods on `DiskANNIndex` (`search`,
`search_recorded`, `range_search`, `multihop_search`) have been removed
and replaced by typed parameter structs that carry their own search
logic.
### What Changed
| Removed | Replacement |
|------------------------------------------------------------|--------------------------------------------------------------|
| `SearchParams` struct | `diskann::graph::search::Knn` |
| `RangeSearchParams` struct | `diskann::graph::search::Range` |
| `SearchParamsError` | `diskann::graph::KnnSearchError` |
| `RangeSearchParamsError` | `diskann::graph::RangeSearchError` |
| `index.search(&strategy, &ctx, &query, ¶ms, &mut out)` |
`index.search(knn, &strategy, &ctx, &query, &mut out)` |
| `index.search_recorded(..., &mut recorder)` |
`index.search(RecordedKnn::new(knn, &mut recorder), ...)` |
| `index.range_search(&strategy, &ctx, &query, ¶ms)` |
`index.search(range, &strategy, &ctx, &query, &mut ())` |
| `index.multihop_search(..., &label_eval)` |
`index.search(MultihopSearch::new(knn, &label_eval), ...)` |
| `index.diverse_search(...)` | `index.search(Diverse::new(knn,
diverse_params), ...)` |
**`flat_search`** remains an inherent method on `DiskANNIndex`
Its `search_params` argument changed from `&SearchParams` to `&Knn`.
### Upgrade Instructions
#### 1. k-NN Search (`search`)
**Before:**
```rust
use diskann::graph::SearchParams;
let params = SearchParams::new(10, 100, None)?;
let stats = index.search(&strategy, &ctx, &query, ¶ms, &mut output).await?;
```
**After:**
```rust
use diskann::graph::{Search, search::Knn};
let params = Knn::new(10, 100, None)?;
// Note: params is now the FIRST argument (moved before strategy)
let stats = index.search(params, &strategy, &ctx, &query, &mut output).await?;
```
Key differences:
- `SearchParams` -> `Knn` (import from `diskann::graph::search::Knn`)
- `SearchParamsError` -> `KnnSearchError` (import from
`diskann::graph::KnnSearchError`)
- Search params moved to the **first** argument of `index.search()`
- `k_value`, `l_value` fields are now private; use `.k_value()`,
`.l_value()` accessors (return `NonZeroUsize`)
#### 2. Recorded/Debug Search (`search_recorded`)
**Before:**
```rust
use diskann::graph::SearchParams;
let params = SearchParams::new(10, 100, None)?;
let stats = index
.search_recorded(&strategy, &ctx, &query, ¶ms, &mut output, &mut recorder)
.await?;
```
**After:**
```rust
use diskann::graph::{Search, search::{Knn, RecordedKnn}};
let params = Knn::new(10, 100, None)?;
let recorded = RecordedKnn::new(params, &mut recorder);
let stats = index.search(recorded, &strategy, &ctx, &query, &mut output).await?;
```
#### 3. Range Search (`range_search`)
**Before:**
```rust
use diskann::graph::RangeSearchParams;
let params = RangeSearchParams::new(None, 100, None, 0.5, None, 1.0, 1.0)?;
let (stats, ids, distances) = index
.range_search(&strategy, &ctx, &query, ¶ms)
.await?;
```
**After:**
```rust
use diskann::graph::{
Search,
search::Range,
RangeSearchOutput,
};
// Simple form:
let params = Range::new(100, 0.5)?;
// Or full options form:
let params = Range::with_options(None, 100, None, 0.5, None, 1.0, 1.0)?;
// Note: output buffer is `&mut ()` — results come back in the return type
let result: RangeSearchOutput<_> = index
.search(params, &strategy, &ctx, &query, &mut ())
.await?;
// Access results:
let stats = result.stats;
let ids = result.ids; // Vec<O>
let distances = result.distances; // Vec<f32>
```
Key differences:
- `RangeSearchParams` -> `Range` (import from
`diskann::graph::search::Range`)
- `RangeSearchParamsError` -> `RangeSearchError` (import from
`diskann::graph::RangeSearchError`)
- Return type changed from `(SearchStats, Vec<O>, Vec<f32>)` to
`RangeSearchOutput<O>` (a struct with `.stats`, `.ids`, `.distances`
fields)
- Pass `&mut ()` as the output buffer
- Field `starting_l_value` -> constructor arg `starting_l` (accessor:
`.starting_l()`)
- Field `initial_search_slack` -> constructor arg `initial_slack`
(accessor: `.initial_slack()`)
- Field `range_search_slack` -> constructor arg `range_slack` (accessor:
`.range_slack()`)
#### 4. Multihop / Label-Filtered Search (`multihop_search`)
**Before:**
```rust
use diskann::graph::SearchParams;
let params = SearchParams::new(10, 100, None)?;
let stats = index
.multihop_search(&strategy, &ctx, &query, ¶ms, &mut output, &label_eval)
.await?;
```
**After:**
```rust
use diskann::graph::{Search, search::{Knn, MultihopSearch}};
let knn = Knn::new(10, 100, None)?;
let params = MultihopSearch::new(knn, &label_eval);
let stats = index.search(params, &strategy, &ctx, &query, &mut output).await?;
```
Key differences:
- `MultihopSearch` wraps a `Knn` -> label evaluator into a single params
object
- The label evaluator is part of the params, not a separate argument
#### 5. Flat Search (unchanged method, new param type)
**Before:**
```rust
use diskann::graph::SearchParams;
let params = SearchParams::new(10, 100, None)?;
index.flat_search(&strategy, &ctx, &query, &filter, ¶ms, &mut output).await?;
```
**After:**
```rust
use diskann::graph::search::Knn;
let params = Knn::new(10, 100, None)?;
index.flat_search(&strategy, &ctx, &query, &filter, ¶ms, &mut output).await?;
```
Only the parameter type changed (`SearchParams` -> `Knn`).
### Import Path Changes
| Old | New |
|------------------------------------------|--------------------------------------------------------|
| `diskann::graph::SearchParams` | `diskann::graph::search::Knn` |
| `diskann::graph::RangeSearchParams` | `diskann::graph::search::Range`
|
| `diskann::graph::SearchParamsError` | `diskann::graph::KnnSearchError`
|
| `diskann::graph::RangeSearchParamsError` |
`diskann::graph::RangeSearchError` |
| — | `diskann::graph::search::MultihopSearch` (new) |
| — | `diskann::graph::search::RecordedKnn` (new) |
| — | `diskann::graph::search::Diverse` (new, feature-gated) |
| — | `diskann::graph::Search` (trait, re-exported) |
| — | `diskann::graph::RangeSearchOutput` (re-exported) |
## Change List
* copy bftrees from the snapshot location to the save location by
@backurs in #783
* (RFC) Refactor search interface with unified SearchDispatch trait by
@narendatha in #773
* Make queue.closest_notvisited() safe and update call sites by @arrayka
in #787
* git ignore: Ignore local settings for claude code AI agent by @arrayka
in #789
* Enabling flag support in codecov by @arrayka in
#790
* Increase unit test coverage for diskann-tools crate by @Copilot in
#763
* Neon MVP by @hildebrandmw in
#777
* Adding GraphParams to be able to save graph parameters of index to
SavedParams by @backurs in #786
## New Contributors
* @narendatha made their first contribution in
#773
**Full Changelog**:
0.46.0...v0.47.0
Summary
This PR significantly improves test coverage for the diskann-tools crate, increasing overall coverage from 52% to 63% with function coverage at 75%.
Changes Made
New Tests Added:
cmd_tool_error.rs (91% coverage, was 0%)
parameter_helper.rs (100% coverage, was 0%)
tracing.rs (70% coverage, was 0%)
gen_associated_data_from_range.rs (98% coverage, was 0%)
filter_search_utils.rs (77% coverage, was 63%)
search_index_utils.rs (68% coverage, was 66%)
random_data_generator.rs (95% coverage, was 77%)
generate_synthetic_labels_utils.rs (92% coverage, was 87%)
Removed Per Review Feedback:
data_type.rsandgraph_data_types.rs(enums and derived traits only)cmd_tool_error.rsFixed for CI Compatibility:
VirtualStorageProvider::new()calls to useVirtualStorageProvider::new_memory()for compatibility withvirtual_storagefeature flag used in CIrelative_contrast.rsandbuild_disk_index.rsgen_associated_data_from_rangefunction generic overStorageWriteProvidertrait to support both FileStorageProvider (CLI usage) and VirtualStorageProvider (tests)Coverage Summary
Overall Metrics:
Modules with 90%+ Coverage (7 total):
Modules Not Covered:
These modules are better suited for integration tests rather than unit tests and would require significant test infrastructure setup.
Known Issues
radius > 127.0 && radius <= 0.0can never be true. This should be fixed in a separate PR (likely should be||instead of&&).Testing
All 61 tests pass successfully:
Tests pass with and without CI features:
cargo test -p diskann-tools --libcargo test -p diskann-tools --lib --features diskann-providers/virtual_storagecargo build -p diskann-tools --bin gen_associated_data_from_rangeCode Quality
-D warningscargo fmt --checkOriginal prompt
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.