🎉 Through dedicated efforts from July to December 2024, the SGLang team has achieved significant milestones with three major releases: v0.2, v0.3, and v0.4. For detailed optimization insights, please refer to our corresponding blog posts.
🚀 We're proud to announce that SGLang has been adopted as:
- The dominant LLM engine by AMD
- The default LLM engine for xAI
For more information, please check out AMD's ROCm 6.3 official announcement and xAI's presentation at the AMD Advancing AI Conference 2024.
[2024-12-04] SGLang v0.4: Zero-Overhead Batch Scheduler, Cache-Aware Load Balancer, Faster Structured Outputs
[2024-09-04] SGLang v0.3 Release: 7x Faster DeepSeek MLA, 1.5x Faster torch.compile, Multi-Image/Video LLaVA-OneVision
[2024-07-25] Achieving Faster Open-Source Llama3 Serving with SGLang Runtime (vs. TensorRT-LLM, vLLM)
[2024-02-05] Fast JSON Decoding for Local LLMs with Compressed Finite State Machine
[2024-01-17] Fast and Expressive LLM Inference with RadixAttention and SGLang
[2024-11-13] SGLang: Fast Serving Framework for Large Language and Vision-Language Models on AMD GPUs
[2024-12-21] SGLang v0.4 Optimization
[2024-11-10] SGLang Performance Optimization
[2024-10-16] SGLang Overview & CPU Overhead Hiding
[2024-10-16] Faster Constrained Decoding
[2024-10-16] SGLang DeepSeek MLA
[2024-10-16] Universal LLM deployment and low-latency serving in MLC LLM
[2024-10-16] XGrammar: Flexible And Efficient Structured Generation Engine for Large Language Models
[2024-10-16] Review of the first LMSYS online meetup: Efficient LLM Deployment and Serving
[2024-10-10] Efficient LLM Inference with SGLang
[2024-11-30] Update Weights From Distributed
[2024-11-16] SGLang Router and Side-Channel KV Cache Attack
[2024-11-02] Quantization on AMD
[2024-10-05] SGLang Double Sparsity
[2024-09-21] SGLang DeepSeek MLA
SGLang v0.2: Faster Interface and Runtime for LLM Inference
Welcome to follow our YouTube channel.
[2024-11-10] SGLang Performance Optimization
[2024-10-16] The First SGLang Online Meetup
[2024-10-10] Efficient LLM Inference with SGLang
[2024-12-14] SGLang Developer Sync 20241214
[2024-11-30] SGLang Developer Sync 20241130
[2024-11-16] SGLang Developer Sync 20241116
[2024-11-03] SGLang Developer Sync 20241103
[2024-10-19] SGLang Developer Sync 20241019
[2024-10-05] SGLang Developer Sync 20241005
[2024-09-21] SGLang Developer Sync 20240921
[NeurIPS 24] SGLang: Efficient Execution of Structured Language Model Programs