Skip to content

The HNSW graph search returns less than the world size (so recall can never be 1.0) ? #352

Open
@sameraamar

Description

@sameraamar

Hi

Thanks for the great job and implementation.

Let's assume my world has N vectors.
I have noticed that:

  1. if I run search where k=N (get me all vectors), I get only part of it.
  2. More than that, there are vectors that are never returned by hnsw::search() method

Note: I am doing #1 as a sanity test but the more interesting is #2.

Let's take this scenario as an example:
M=32
efConstruction=800
My world has 101 vectors

When running hnsw indexer, I get a graph which has
entry point vector #25
layer 1 with 4 vectors
layer 0 with 101 nodes

The issue is that the vectors from #1 to #100 , each one has exactly 64 neighbors in layer 0, but non of them is vector #101 !

Item #101 is too far from everybody else so in fact it was isolated.
The search() will never return vector #101 because we run using BFS , and our starting point is always the entry point (which is vector #25)

In fact, even if I run search(q) when my query is vector #101 itself, I get 100 vectors and non of them is #101 (as stated above, this is because we do BFS when our entry point #25)

How HNSW should deal with such scenario ?
What am I doing wrong here?

Thanks

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions