Skip to content

LMOS-IO/awesome-local-ai

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

awesome-local-ai

An index of self hosted AI products

openai compatible inference engines

  • vllm production system focused on batching
  • sglang production system focused on structured generation and agentic use
  • aphrodite engine a fork of vllm with more quantization support
  • lorax production system focused on dynamic LORAs by predibase
  • ollama llamacpp wrapper with some extra features, designed for developer laptops
  • koboldcpp fork of llamacpp designed for roleplay
  • lmdeploy multimodal server by internLM

LLM inference engines

  • llama.cpp lightweight llm runtime for CPU/GPU
  • exllamav2 lightweight llm runtime for GPU. fast quantization + supports TP with any number of GPUs
  • mlc-llm optimised llm runtime for many backends. Can run on wasm.
  • TensorRT-llm nvidia's official runtime for their GPUs
  • ctranslate2 C based inference engine for many model types
  • hf transformers not the fastest but supports the most models

structured generation

  • formatron super fast rust based token constrained generation
  • LMFE structured gen for batching with beam search
  • outlines structured gen for many backends with json
  • guidance supports interleaved tool calls

About

An index of self hosted AI products

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published