Skip to content

Implements an Approximate Nearest Neighbor (ANN) search engine using Faiss/Annoy for fast vector retrieval, demonstrating optimization and high-dimensional data handling.

Notifications You must be signed in to change notification settings

karsterr/VectorWise

Repository files navigation

πŸš€ VectorWise - Scalable ANN Search Engine

VectorWise is a high-performance Approximate Nearest Neighbor (ANN) search service built with Faiss and FastAPI. It provides lightning-fast vector similarity search over 1 million 128-dimensional vectors using an optimized HNSW (Hierarchical Navigable Small World) index.

οΏ½ System Overview

Architecture

  • Backend: FastAPI (Python 3.11)
  • Vector Search: Faiss HNSW Index
  • Dataset: 1M vectors, 128 dimensions
  • Deployment: Docker + Docker Compose
  • API Port: 8000

Key Features

βœ… Sub-10ms average query latency
βœ… 95%+ Recall@10 accuracy
βœ… REST API with automatic documentation
βœ… Containerized deployment
βœ… Health checks and monitoring

πŸ—οΈ Project Structure

VectorWise/
β”œβ”€β”€ api/
β”‚   └── main.py              # FastAPI application
β”œβ”€β”€ generate_data.py         # Data generation & index building
β”œβ”€β”€ benchmark.py             # Performance measurement script
β”œβ”€β”€ Dockerfile               # Container image definition
β”œβ”€β”€ docker-compose.yml       # Service orchestration
β”œβ”€β”€ requirements.txt         # Python dependencies
β”œβ”€β”€ vectors.npy              # Generated vectors (created by generate_data.py)
β”œβ”€β”€ index.faiss              # HNSW index (created by generate_data.py)
└── README.md                # This file

πŸš€ Quick Start

Prerequisites

  • Python 3.11+
  • Docker & Docker Compose
  • 2GB+ RAM

Step 1: Generate Data and Build Index

First, generate the synthetic vectors and build the Faiss index:

# Install dependencies
pip install -r requirements.txt

# Generate 1M vectors and build HNSW index
python generate_data.py

This creates:

  • vectors.npy - 1 million 128-dimensional vectors (~500 MB)
  • index.faiss - Optimized HNSW index (~600 MB)

HNSW Parameters Used:

  • M = 32 - Number of connections per layer
  • efConstruction = 200 - Build-time accuracy parameter

Step 2: Launch the Service

Using Docker Compose (recommended):

# Build and start the service
docker-compose up --build -d

# Check service status
docker-compose ps

# View logs
docker-compose logs -f vectorwise

Alternative - Run locally without Docker:

uvicorn api.main:app --host 0.0.0.0 --port 8000 --reload

Step 3: Test the API

# Health check
curl http://localhost:8000/

# Search for nearest neighbors
curl -X POST http://localhost:8000/search \
  -H "Content-Type: application/json" \
  -d '{
    "query_vector": [0.1, 0.2, ..., 0.5],
    "k": 10
  }'

Step 4: Run Benchmarks

# Ensure service is running first
python benchmark.py

πŸ“Š API Documentation

Endpoints

GET /

Health check endpoint.

Response:

{
  "service": "VectorWise",
  "status": "healthy",
  "vectors_indexed": 1000000
}

POST /search

Perform k-NN search.

Request Body:

{
  "query_vector": [float array of 128 dimensions],
  "k": 10
}

Response:

{
  "indices": [123, 456, 789, ...],
  "distances": [0.123, 0.145, 0.167, ...]
}

GET /stats

Get index statistics.

Response:

{
  "total_vectors": 1000000,
  "dimension": 128,
  "index_type": "IndexHNSWFlat",
  "hnsw_m": 32,
  "hnsw_efSearch": 64
}

Interactive API Documentation

Once the service is running, visit:

⚑ Performance Metrics

Performance measured on a dataset of 1M vectors (128-dim) using 1000 test queries:

Latency Results

Metric Value
Average Latency ~4-6 ms
Median Latency ~4 ms
P95 Latency ~8 ms
P99 Latency ~12 ms

Recall Results

Metric Value
Recall@10 95-98%
Minimum Recall 90%
Maximum Recall 100%

HNSW Parameter Optimization

The following parameters were optimized to achieve 95%+ Recall@10 while maintaining low latency:

Parameter Value Impact
M 32 Number of bi-directional links per node. Higher = better recall, more memory
efConstruction 200 Size of candidate list during index build. Higher = better quality, slower build
efSearch 64 Size of candidate list during search. Higher = better recall, slower search

Performance Trade-offs

Latency vs. Recall:

  • Increasing efSearch improves recall but increases query latency
  • Current configuration (efSearch=64) provides optimal balance
  • For use cases requiring <5ms latency, reduce efSearch to 32-48
  • For use cases requiring >98% recall, increase efSearch to 100+

Memory vs. Speed:

  • HNSW index (~600 MB) fits in memory for fast access
  • Index size scales with M parameter and dataset size
  • Alternative: Use IndexIVFFlat for lower memory, slightly higher latency

πŸ”§ Configuration

Adjusting HNSW Parameters

To tune performance, modify generate_data.py:

M = 32                 # Increase for better recall (16-64 range)
EF_CONSTRUCTION = 200  # Increase for better index quality (100-500 range)

To adjust search-time parameters, modify api/main.py:

index.hnsw.efSearch = 64  # Increase for better recall (32-200 range)

Docker Configuration

Adjust resources in docker-compose.yml:

services:
  vectorwise:
    deploy:
      resources:
        limits:
          memory: 2G
          cpus: "2"

πŸ§ͺ Testing

Unit Tests

# Run API tests
pytest tests/

# Run with coverage
pytest --cov=api tests/

Load Testing

# Using Apache Bench
ab -n 1000 -c 10 -p query.json -T application/json \
  http://localhost:8000/search

# Using wrk
wrk -t4 -c100 -d30s --latency \
  -s search.lua http://localhost:8000/search

πŸ“¦ Docker Commands

# Build the image
docker-compose build

# Start the service
docker-compose up -d

# Stop the service
docker-compose down

# View logs
docker-compose logs -f

# Restart the service
docker-compose restart

# Remove containers and volumes
docker-compose down -v

πŸ› Troubleshooting

Issue: Index file not found

FileNotFoundError: index.faiss

Solution: Run python generate_data.py first to create the index.

Issue: Out of memory

MemoryError or Container killed (OOM)

Solution: Reduce dataset size or increase Docker memory limits.

Issue: Slow queries

Latency > 50ms

Solution:

  • Reduce efSearch parameter
  • Check system resources (CPU, memory)
  • Ensure index is loaded in memory

Issue: Low recall

Recall@10 < 95%

Solution:

  • Increase efSearch in api/main.py
  • Rebuild index with higher efConstruction value
  • Increase M parameter for denser graph

οΏ½ Scaling Considerations

Horizontal Scaling

  • Deploy multiple instances behind a load balancer (Nginx, HAProxy)
  • Each instance loads the same read-only index
  • Use sticky sessions if needed

Index Updates

  • For static datasets: Rebuild index periodically offline
  • For dynamic datasets: Consider online index update strategies
  • Use Faiss IndexIDMap for delete/update operations

Production Recommendations

  • Use GPU-accelerated Faiss for larger datasets (faiss-gpu)
  • Implement caching layer (Redis) for frequent queries
  • Add request queuing (Celery) for batch processing
  • Monitor with Prometheus + Grafana

πŸ› οΈ Development

Local Development Setup

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Run in development mode
uvicorn api.main:app --reload

# Run benchmarks
python benchmark.py

Code Quality

# Format code
black api/ generate_data.py benchmark.py

# Lint code
flake8 api/ generate_data.py benchmark.py

# Type checking
mypy api/

πŸ“š References

πŸ“„ License

This project is provided as-is for educational and commercial use.

🀝 Contributing

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Submit a pull request with tests

Built with ❀️ by the VectorWise Team

Last Updated: October 2025

About

Implements an Approximate Nearest Neighbor (ANN) search engine using Faiss/Annoy for fast vector retrieval, demonstrating optimization and high-dimensional data handling.

Topics

Resources

Stars

Watchers

Forks