Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).
bloom
falcon
moe
gemma
mistral
mixture-of-experts
model-quantization
multi-gpu-inference
m2m100
llamacpp
llm-inference
internlm
llama2
qwen
baichuan2
mixtral
phi-2
deepseek
minicpm
-
Updated
Mar 15, 2024 - C++