Mimir is a document processing and embedding pipeline that supports high-quality semantic search and retrieval. It currently uses the state-of-the-art nomic-ai/nomic-embed-text-v2-moe embedding model for best-in-class embedding quality.
You can use Nix to get a fully reproducible development environment with all system dependencies (Python, pip, gcc, make, PDF tools, etc). Python packages are still managed with a local venv for maximum compatibility.
- Enter the Nix dev shell:
nix develop 
- Activate your Python venv:
If you don't have a venv yet, it will be auto-created the first time you enter the Nix shell, or you can create it manually:source venv/bin/activatepython3 -m venv venv source venv/bin/activate
- Install Python dependencies:
pip install -r requirements.txt 
- Run your usual commands:
make make embedding-server bash scripts/test_embedding_pipeline.sh # etc.
If you don't use Nix, you can still set up manually:
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt- The Nix shell provides all system tools (compilers, PDF tools, etc) so you don't need to install them globally.
- Python dependencies are managed with pip/venv for compatibility with most Python tooling.
- The Nix shell prints a welcome message and tool versions on entry.
- Your venv/andmodels/directories are ignored by git and Cursor IDE.
- High-quality embeddings using nomic-ai/nomic-embed-text-v2-moe
- FastAPI-based persistent embedding server (model loaded once, batched requests, auto-downloads model if needed)
- C++ pipeline with chunking, session management, and HTTP embedding calls
- Test script for end-to-end validation and benchmarking
- Hardware Detection: Automatically detect CPU/GPU and select the best embedding model for the hardware
-  Configurable Embedding Model: Allow users to set the embedding model in config.yaml(already partially supported)
- ONNX Runtime Support: Convert and serve embedding models via ONNX Runtime for faster CPU inference
- Model Quantization: Support INT8/FP16 quantized models for even faster inference on CPU/GPU
- User Model Selection: Let users choose between "Best Quality" and "Fastest" embeddings at runtime
- Efficient Storage: Use binary formats (e.g., Parquet, npy) for large embedding tables
- Profiling and Monitoring: Add profiling hooks to measure and optimize pipeline bottlenecks
- Batching and Streaming: Support streaming and larger batch sizes for high-throughput scenarios
For questions or contributions, please open an issue or pull request!
Note for macOS users:
This project uses cpp-httplib for HTTP. To ensure compatibility, you must use Homebrew’s clang++ and cpp-httplib:
- Install cpp-httplib and LLVM via Homebrew:
brew install cpp-httplib llvm 
- Build as usual:
The Makefile will auto-detect and use Homebrew’s clang++ if available, ensuring ABI compatibility with Homebrew's cpp-httplib.make clean && make
- If you see linker errors about missing cpp-httplib symbols:
- Make sure you are not using Nix’s g++ to build.
- The Makefile will fall back to system clang++ if Homebrew's is not found, but you may need to adjust your PATH or install Homebrew's LLVM.
 
On Linux/Nix:
- The build uses g++ or clang++ and system/Nix-provided libraries as usual.
Summary:
- Do not mix Nix GCC and Homebrew cpp-httplib on macOS.
- Use Homebrew’s clang++ and cpp-httplib together for ABI compatibility.
- The Makefile auto-selects the best compiler for your environment.