Skip to content

Add info for how Harper vector indexing works #44

@kylebernhardy

Description

@kylebernhardy

In using Hairper I wanted to add semantic search to a project. Hairper had no context regarding Harper's ability to store and index embeddings. In a prompt I had given our docs link for Vector indexing but it still hallucinated output in the schema and in how to search the index programmatically.

Generated GraphQL schema for vector index:

  embedding: [Float] @vector(dimensions: 1024)

Where the basic declaration should really look like:

  embedding: [Float] @indexed(type: "HNSW")

The condition for a vector search was also incorrectly formed as well as was added as a part of the condition

{
    attribute: "embedding",
    comparator: "vector",
    value: embedding,
    k: vector_k,
  }

where it should be a sort and look like

sort: { attribute: 'embedding', target: searchVectorValue }

I did also ask ChatGPT what it understands about Harper's vector indexing. The high level understanding was correct as was it's output regarding how to define the schema declaration. Granted, it did directly look up the web. When I asked ChatGPT to show me what a query would look like it definitely hallucinated with output like:

query {
  Product(
    orderBy: {
      embedding: {
        near: {
          vector: [0.012, -0.44, 0.991, ...],
          k: 5
        }
      }
    }
  ) {
    id
    name
  }
}

Perhaps we also need to further refine our docs, which can go as a separate issue in our docs repo.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions