Skip to content

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

License

Notifications You must be signed in to change notification settings

ModelTC/LightLLM

 
 

Repository files navigation

Pre$^3$: Enabling Deterministic Pushdown Automata for Faster Structured LLM Generation

License Paper

Junyi Chen, Shihao Bai, Zaijun Wang, Siyu Wu, Chuheng Du, Hailong Yang, Ruihao Gong📧, Shengzhong Liu📧, Fan Wu, Guihai Chen

(📧 denotes corresponding author.)

This is the official implementation of our paper introducing Pre$^3$, an efficient structured generation method for LLMs that optimizes LR(1) grammar processing. Existing approaches parse LR(1) grammars into pushdown automata (PDA), incurring runtime overhead for context-dependent token processing—particularly inefficient under large inference batches. In contrast, $\text{Pre}^3$ leverages precomputed prefix-conditioned edges during preprocessing to enable lightweight transitions and parallel processing. Additionally, we introduce a novel algorithm that transforms LR(1) transition graphs into deterministic pushdown automata (DPDA), eliminating runtime path exploration while maintaining minimal overhead. Seamlessly integrable with standard LLM inference frameworks, $\text{Pre}^3$ achieves up to 40% faster time per output token (TPOT) and 36% higher throughput in large batch size simulation experiments.

News

  • May 15, 2025: 🌟 Our paper has been accepted by ACL 2025 Main Conference! 🎉 Cheers!

Overview

Structured generation is crucial for LLM applications requiring formatted outputs like JSON or function calls, where constrained decoding ensures syntactic validity. Existing approaches based on LR(1) grammars or pushdown automata (PDA) face inherent inefficiencies: LR(1) methods incur computational overhead from context-dependent token processing, while PDA-based solutions suffer from non-deterministic transitions requiring runtime stack management. To address these limitations, we propose Pre³, a deterministic pushdown automaton (DPDA) framework that transforms LR(1) grammars through prefix-conditioned edges and cyclic-aware conversion. By precomputing all transitions and enabling parallel verification, Pre³ eliminates runtime exploration while maintaining grammatical constraints, providing an efficient solution for structured generation tasks. The framework integrates seamlessly with standard LLM inference pipelines.

Quick Start

After cloning the repository, you can follow these steps to try our JSON structured generation.

Requirements

With Python (=3.9) and PyTorch (>2.0) installed, execute the following command to install the necessary packages and pre-trained models.

git checkout pre3-integrated
pip install -r requirements.txt

Launching Server

We'd like to provide the following script to launch the inference framework. More details about our method can be found in our paper and blog.

bash ./launch_lightllm.sh

Inference

Here is the corresponding command for inference.

python test/format_out/test_pre3_constraint.py

TODO

  • A more robust and efficient implementation.

  • Adapt to a wider variety of grammars.

Acknowledgments

Our code was developed based on LightLLM, an efficient Python-based LLM inference framework. We thank the following projects for their pioneering work in structured generation that inspired our research:

  • SynCode for its innovative approaches to LR(1)-grammar-constrained decoding.

  • Outlines for its finite state machine-based structured generation techniques.

  • XGrammar for its breakthrough in context-free grammar processing and pushdown automata optimization.

About

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Languages