gLLM: Global Balanced Pipeline Parallelism System for Distributed LLM Serving with Token Throttling
-
Updated
Dec 4, 2025 - Python
gLLM: Global Balanced Pipeline Parallelism System for Distributed LLM Serving with Token Throttling
"This repository is a proof-of-concept demonstrating how to deploy and manage VLLM for fast LLM inference across a supercluster. It showcases distributed system architecture for high-performance computing (HPC)."
Add a description, image, and links to the pagedattention topic page so that developers can more easily learn about it.
To associate your repository with the pagedattention topic, visit your repo's landing page and select "manage topics."