DocArray wrap ANN libraries

### *Project idea 2: DocArray wrap ANN libraries*

| Info             | details                                                      |
| ---------------- | ------------------------------------------------------------ |
| Skills needed    | Python, ANN Search experience                                |
| Project size     | 175 hours                                                    |
| Difficulty level | Medium                                                       |
| Mentors          | @[Johannes Messner](https://github.com/JohannesMessner), @[Sami Jaghouar](https://github.com/samsja), @[Philip Vollet](https://www.linkedin.com/in/philipvollet) |

#### Project Description

- In DocArray, we have been concentrating on developing production-ready Vector DBs for large-scale searches. However, there are many ANN libraries without scalability layers that can be integrated into DocArray, making it accessible to academia and production teams with small-to-medium amounts of data, without the need for external services.

- DocArray v2 will have a concept called Document Index. This is an abstraction that lets a user store their Documents (on disk or in a database), and retrieve them using ANN search. As such, there can be multiple Document Indexes backed by different backends: Elastic, Qdrant, Weaviat, ...., but all following the same basic API.

- The idea behind this project is to take an ANN library and use it to implement a Document Index. There is already an implementation using HNSWLib that you can find here: https://github.com/docarray/docarray/pull/1124, But there is space to create similar backends using other libraries: Annoy, Faiss, ... The goal is to provide user choice.

- If there is interest, someone could also implement a backend using a vector database. We already have Qdrant, Weaviate, and Elastic covered, but Milvus, Redis, and some others could also be interesting. You can find a design doc for Document Index [here](https://lightning-scent-57a.notion.site/Document-Stores-v2-design-doc-f11d6fe6ecee43f49ef88e0f1bf80b7f).

#### Expected outcomes

- We have a set of DocStores implementations in DocArray that support the most popular ANN libraries, such as FAISS, Annoy, and Hnswlib.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DocArray wrap ANN libraries #17

Project idea 2: DocArray wrap ANN libraries

Project Description

Expected outcomes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Info	details
Skills needed	Python, ANN Search experience
Project size	175 hours
Difficulty level	Medium
Mentors	@Johannes Messner, @Sami Jaghouar, @Philip Vollet