Skip to content

marekgalovic/anndb

Repository files navigation

AnnDB CircleCI

AnnDB is a horizontally scalable and distributed approximate nearest neighbors database. It is build from the ground up to scale to millions of high-dimensional vectors while providing low latency and high throughput.

Install

Install from source

git clone git@github.com:marekgalovic/anndb.git
cd anndb
make compile

Install using docker

git clone git@github.com:marekgalovic/anndb.git
cd anndb
docker build -t anndb-server ./

Architecture

Nodes diagram AnnDB uses a custom implementation of HNSW [1] to make search in high-dimensional vector spaces fast. It splits each dataset and its underlying index into partitions. Partitions are distributed and replicated across nodes in the cluster using Raft protocol [2] to ensure high availability and data durability in case of node failures. Search is performed in a map-reduce like fashion. Node that receives a search request from the client samples a node for each partition and sends partition search request to that node. Each of these nodes then searches requested partitions and aggregates results locally before sending it to the driver node which re-aggregates responses from all partitions and sends the result to the client.

Benchmark

A scaled down version of ANN Benchmarks. I used a GCP instance with 16 cores and 32GB RAM. Benchmark results

References