A high-throughput and memory-efficient inference and serving engine for LLMs
-
Updated
Jan 25, 2026 - Python
A high-throughput and memory-efficient inference and serving engine for LLMs
Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train OpenAI gpt-oss, DeepSeek, Qwen, Llama, Gemma, TTS 2x faster with 70% less VRAM.
🔥 MaxKB is an open-source platform for building enterprise-grade agents. 强大易用的开源企业级智能体平台。
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3, Qwen3-MoE, DeepSeek-R1, GLM4.5, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Llava, Phi4, ...) (AAAI 2025).
Agent Reinforcement Trainer: train multi-step agents for real-world tasks using GRPO. Give your agents on-the-job training. Reinforcement learning for Qwen2.5, Qwen3, Llama, and more!
Open Source Deep Research Alternative to Reason and Search on Private Data. Written in Python.
📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉
Fully Open Framework for Democratized Multimodal Training
OSS RL environment + evals toolkit
Pytorch Distributed native training library for LLMs/VLMs with OOTB Hugging Face support
Higher performance OpenAI LLM service than vLLM serve: A pure C++ high-performance OpenAI LLM service implemented with GPRS+TensorRT-LLM+Tokenizers.cpp, supporting chat and function call, AI agents, distributed multi-GPU inference, multimodal capabilities, and a Gradio chat interface.
LvLLM is a special NUMA extension of vllm that makes full use of CPU and memory resources, reduces GPU memory requirements, and features an efficient GPU parallel and NUMA parallel architecture, supporting hybrid inference for MOE large models.
Deploy open-source LLMs on AWS in minutes — with OpenAI-compatible APIs and a powerful CLI/SDK toolkit.
A Lightweight LLM Inference Performance Simulator
监控nof1.ai Alpha Arena AI大模型加密货币交易行为的通知系统
gLLM: Global Balanced Pipeline Parallelism System for Distributed LLM Serving with Token Throttling
Add a description, image, and links to the qwen3 topic page so that developers can more easily learn about it.
To associate your repository with the qwen3 topic, visit your repo's landing page and select "manage topics."