A simple API server on top of your favorite locally runnable foundation models. We support
- llama.cpp compatible models, including k-quant models
- ARM NEON CPUs, BLAS, (CUDA coming soon)
- Whisper transcription models (COMING SOON)
- Visual models for object detection and segmentation (COMING SOON)
- LLM completion
- Transcription (coming soon)