Junyi Chen, Shihao Bai, Zaijun Wang, Siyu Wu, Chuheng Du, Hailong Yang, Ruihao Gong📧, Shengzhong Liu📧, Fan Wu, Guihai Chen
(📧 denotes corresponding author.)
This is the official implementation of our paper introducing Pre$^3$, an efficient structured generation method for LLMs that optimizes LR(1) grammar processing. Existing approaches parse LR(1) grammars into pushdown automata (PDA), incurring runtime overhead for context-dependent token processing—particularly inefficient under large inference batches. In contrast,
- May 15, 2025: 🌟 Our paper has been accepted by ACL 2025 Main Conference! 🎉 Cheers!
Structured generation is crucial for LLM applications requiring formatted outputs like JSON or function calls, where constrained decoding ensures syntactic validity. Existing approaches based on LR(1) grammars or pushdown automata (PDA) face inherent inefficiencies: LR(1) methods incur computational overhead from context-dependent token processing, while PDA-based solutions suffer from non-deterministic transitions requiring runtime stack management. To address these limitations, we propose Pre³, a deterministic pushdown automaton (DPDA) framework that transforms LR(1) grammars through prefix-conditioned edges and cyclic-aware conversion. By precomputing all transitions and enabling parallel verification, Pre³ eliminates runtime exploration while maintaining grammatical constraints, providing an efficient solution for structured generation tasks. The framework integrates seamlessly with standard LLM inference pipelines.
After cloning the repository, you can follow these steps to try our JSON structured generation.
With Python (=3.9) and PyTorch (>2.0) installed, execute the following command to install the necessary packages and pre-trained models.
git checkout pre3-integrated
pip install -r requirements.txtWe'd like to provide the following script to launch the inference framework. More details about our method can be found in our paper and blog.
bash ./launch_lightllm.shHere is the corresponding command for inference.
python test/format_out/test_pre3_constraint.py-
A more robust and efficient implementation.
-
Adapt to a wider variety of grammars.
Our code was developed based on LightLLM, an efficient Python-based LLM inference framework. We thank the following projects for their pioneering work in structured generation that inspired our research:
