awesome-local-ai

An index of self hosted AI products

openai compatible inference engines

vllm production system focused on batching
sglang production system focused on structured generation and agentic use
aphrodite engine a fork of vllm with more quantization support
lorax production system focused on dynamic LORAs by predibase
ollama llamacpp wrapper with some extra features, designed for developer laptops
koboldcpp fork of llamacpp designed for roleplay
lmdeploy multimodal server by internLM

llama.cpp lightweight llm runtime for CPU/GPU
exllamav2 lightweight llm runtime for GPU. fast quantization + supports TP with any number of GPUs
mlc-llm optimised llm runtime for many backends. Can run on wasm.
TensorRT-llm nvidia's official runtime for their GPUs
ctranslate2 C based inference engine for many model types
hf transformers not the fastest but supports the most models

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
LICENSE		LICENSE
README.md		README.md