AnnDB is a horizontally scalable and distributed approximate nearest neighbors database. It is build from the ground up to scale to millions of high-dimensional vectors while providing low latency and high throughput.
git clone git@github.com:marekgalovic/anndb.git
cd anndb
make compile
git clone git@github.com:marekgalovic/anndb.git
cd anndb
docker build -t anndb-server ./
AnnDB uses a custom implementation of HNSW [1] to make search in high-dimensional vector spaces fast. It splits each dataset and its underlying index into partitions. Partitions are distributed and replicated across nodes in the cluster using Raft protocol [2] to ensure high availability and data durability in case of node failures. Search is performed in a map-reduce like fashion. Node that receives a search request from the client samples a node for each partition and sends partition search request to that node. Each of these nodes then searches requested partitions and aggregates results locally before sending it to the driver node which re-aggregates responses from all partitions and sends the result to the client.
A scaled down version of ANN Benchmarks. I used a GCP instance with 16 cores and 32GB RAM.