Skip to content

DocArray wrap ANN libraries #17

Open
@Nick17t

Description

Project idea 2: DocArray wrap ANN libraries

Info details
Skills needed Python, ANN Search experience
Project size 175 hours
Difficulty level Medium
Mentors @Johannes Messner, @Sami Jaghouar, @Philip Vollet

Project Description

  • In DocArray, we have been concentrating on developing production-ready Vector DBs for large-scale searches. However, there are many ANN libraries without scalability layers that can be integrated into DocArray, making it accessible to academia and production teams with small-to-medium amounts of data, without the need for external services.

  • DocArray v2 will have a concept called Document Index. This is an abstraction that lets a user store their Documents (on disk or in a database), and retrieve them using ANN search. As such, there can be multiple Document Indexes backed by different backends: Elastic, Qdrant, Weaviat, ...., but all following the same basic API.

  • The idea behind this project is to take an ANN library and use it to implement a Document Index. There is already an implementation using HNSWLib that you can find here: feat: hnswlib document index docarray/docarray#1124, But there is space to create similar backends using other libraries: Annoy, Faiss, ... The goal is to provide user choice.

  • If there is interest, someone could also implement a backend using a vector database. We already have Qdrant, Weaviate, and Elastic covered, but Milvus, Redis, and some others could also be interesting. You can find a design doc for Document Index here.

Expected outcomes

  • We have a set of DocStores implementations in DocArray that support the most popular ANN libraries, such as FAISS, Annoy, and Hnswlib.

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions