Description
Hi
Thanks for the great job and implementation.
Let's assume my world has N vectors.
I have noticed that:
- if I run search where k=N (get me all vectors), I get only part of it.
- More than that, there are vectors that are never returned by hnsw::search() method
Note: I am doing #1 as a sanity test but the more interesting is #2.
Let's take this scenario as an example:
M=32
efConstruction=800
My world has 101 vectors
When running hnsw indexer, I get a graph which has
entry point vector #25
layer 1 with 4 vectors
layer 0 with 101 nodes
The issue is that the vectors from #1 to #100 , each one has exactly 64 neighbors in layer 0, but non of them is vector #101 !
Item #101 is too far from everybody else so in fact it was isolated.
The search() will never return vector #101 because we run using BFS , and our starting point is always the entry point (which is vector #25)
In fact, even if I run search(q) when my query is vector #101 itself, I get 100 vectors and non of them is #101 (as stated above, this is because we do BFS when our entry point #25)
How HNSW should deal with such scenario ?
What am I doing wrong here?
Thanks