diff --git a/README.md b/README.md index 077c392..0bde027 100644 --- a/README.md +++ b/README.md @@ -49,9 +49,11 @@ | Name | Stars | Release | Contributors | About | Tag | | ---- | ---- | ---- | ---- | ---- | ---- | | **[DeepSpeed-MII](https://github.com/microsoft/DeepSpeed-MII)** | ![Stars](https://img.shields.io/github/stars/microsoft/deepspeed-mii.svg) | ![Release](https://img.shields.io/github/release/microsoft/deepspeed-mii) | ![Contributors](https://img.shields.io/github/contributors/microsoft/deepspeed-mii) | MII makes low-latency and high-throughput inference possible, powered by DeepSpeed. | | +| **[Inference](https://github.com/roboflow/inference)** | ![Stars](https://img.shields.io/github/stars/roboflow/inference.svg) | ![Release](https://img.shields.io/github/release/roboflow/inference) | ![Contributors](https://img.shields.io/github/contributors/roboflow/inference) | A fast, easy-to-use, production-ready inference server for computer vision supporting deployment of many popular model architectures and fine-tuned models. | vision | | **[ipex-llm](https://github.com/intel-analytics/ipex-llm)** | ![Stars](https://img.shields.io/github/stars/intel-analytics/ipex-llm.svg) | ![Release](https://img.shields.io/github/release/intel-analytics/ipex-llm) | ![Contributors](https://img.shields.io/github/contributors/intel-analytics/ipex-llm) | Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, MiniCPM, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, GraphRAG, DeepSpeed, vLLM, FastChat, Axolotl, etc. | edge | | **[llmaz](https://github.com/InftyAI/llmaz)** | ![Stars](https://img.shields.io/github/stars/inftyai/llmaz.svg) | ![Release](https://img.shields.io/github/release/inftyai/llmaz) | ![Contributors](https://img.shields.io/github/contributors/inftyai/llmaz) | ☸️ Effortlessly serve state-of-the-art LLMs on Kubernetes. | | | **[LMDeploy](https://github.com/InternLM/lmdeploy)** | ![Stars](https://img.shields.io/github/stars/internlm/lmdeploy.svg) | ![Release](https://img.shields.io/github/release/internlm/lmdeploy) | ![Contributors](https://img.shields.io/github/contributors/internlm/lmdeploy) | LMDeploy is a toolkit for compressing, deploying, and serving LLMs. | | +| **[MaxText](https://github.com/google/maxtext)** | ![Stars](https://img.shields.io/github/stars/google/maxtext.svg) | ![Release](https://img.shields.io/github/release/google/maxtext) | ![Contributors](https://img.shields.io/github/contributors/google/maxtext) | A simple, performant and scalable Jax LLM! | Jax | | **[llama.cpp](https://github.com/ggerganov/llama.cpp)** | ![Stars](https://img.shields.io/github/stars/ggerganov/llama.cpp.svg) | ![Release](https://img.shields.io/github/release/ggerganov/llama.cpp) | ![Contributors](https://img.shields.io/github/contributors/ggerganov/llama.cpp) | LLM inference in C/C++ | edge | | **[MInference](https://github.com/microsoft/minference)** | ![Stars](https://img.shields.io/github/stars/microsoft/minference.svg) | ![Release](https://img.shields.io/github/release/microsoft/minference) | ![Contributors](https://img.shields.io/github/contributors/microsoft/minference) | To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 while maintaining accuracy. | | | **[MLC LLM](https://github.com/mlc-ai/mlc-llm)** | ![Stars](https://img.shields.io/github/stars/mlc-ai/mlc-llm.svg) | ![Release](https://img.shields.io/github/release/mlc-ai/mlc-llm) | ![Contributors](https://img.shields.io/github/contributors/mlc-ai/mlc-llm) | Universal LLM Deployment Engine with ML Compilation | |