Name	Name	Last commit message	Last commit date
Latest commit History 591 Commits
.github	.github
assets	assets
demos	demos
docs	docs
format_out	format_out
img	img
kernel	kernel
lightllm	lightllm
test	test
tools	tools
unit_tests	unit_tests
.gitignore	.gitignore
.pre-commit-config.yaml	.pre-commit-config.yaml
CONTRIBUTING.md	CONTRIBUTING.md
Dockerfile	Dockerfile
LICENSE	LICENSE
README.md	README.md
benchmark.md	benchmark.md
build_and_upload_docker.sh	build_and_upload_docker.sh
format.py	format.py
launch_lightllm.sh	launch_lightllm.sh
requirements.txt	requirements.txt
setup.py	setup.py

Name

Last commit message

Last commit date

.github

.pre-commit-config.yaml

build_and_upload_docker.sh

Pre$^3$: Enabling Deterministic Pushdown Automata for Faster Structured LLM Generation

Junyi Chen, Shihao Bai, Zaijun Wang, Siyu Wu, Chuheng Du, Hailong Yang, Ruihao Gong📧, Shengzhong Liu📧, Fan Wu, Guihai Chen

(📧 denotes corresponding author.)

This is the official implementation of our paper introducing Pre$^3$, an efficient structured generation method for LLMs that optimizes LR(1) grammar processing. Existing approaches parse LR(1) grammars into pushdown automata (PDA), incurring runtime overhead for context-dependent token processing—particularly inefficient under large inference batches. In contrast, $\text{Pre}^3$ leverages precomputed prefix-conditioned edges during preprocessing to enable lightweight transitions and parallel processing. Additionally, we introduce a novel algorithm that transforms LR(1) transition graphs into deterministic pushdown automata (DPDA), eliminating runtime path exploration while maintaining minimal overhead. Seamlessly integrable with standard LLM inference frameworks, $\text{Pre}^3$ achieves up to 40% faster time per output token (TPOT) and 36% higher throughput in large batch size simulation experiments.

News

May 15, 2025: 🌟 Our paper has been accepted by ACL 2025 Main Conference! 🎉 Cheers!

Overview

Structured generation is crucial for LLM applications requiring formatted outputs like JSON or function calls, where constrained decoding ensures syntactic validity. Existing approaches based on LR(1) grammars or pushdown automata (PDA) face inherent inefficiencies: LR(1) methods incur computational overhead from context-dependent token processing, while PDA-based solutions suffer from non-deterministic transitions requiring runtime stack management. To address these limitations, we propose Pre³, a deterministic pushdown automaton (DPDA) framework that transforms LR(1) grammars through prefix-conditioned edges and cyclic-aware conversion. By precomputing all transitions and enabling parallel verification, Pre³ eliminates runtime exploration while maintaining grammatical constraints, providing an efficient solution for structured generation tasks. The framework integrates seamlessly with standard LLM inference pipelines.

Quick Start

After cloning the repository, you can follow these steps to try our JSON structured generation.

Requirements

With Python (=3.9) and PyTorch (>2.0) installed, execute the following command to install the necessary packages and pre-trained models.

git checkout pre3-integrated
pip install -r requirements.txt

Launching Server

We'd like to provide the following script to launch the inference framework. More details about our method can be found in our paper and blog.

bash ./launch_lightllm.sh

Inference

Here is the corresponding command for inference.

python test/format_out/test_pre3_constraint.py

TODO

A more robust and efficient implementation.
Adapt to a wider variety of grammars.

Acknowledgments

Our code was developed based on LightLLM, an efficient Python-based LLM inference framework. We thank the following projects for their pioneering work in structured generation that inspired our research:

SynCode for its innovative approaches to LR(1)-grammar-constrained decoding.
Outlines for its finite state machine-based structured generation techniques.
XGrammar for its breakthrough in context-free grammar processing and pushdown automata optimization.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pre$^3$: Enabling Deterministic Pushdown Automata for Faster Structured LLM Generation

News

Overview

Quick Start

Requirements

Launching Server

Inference

TODO

Acknowledgments

About

Uh oh!

Releases 3

Packages

Uh oh!

Uh oh!

Contributors 46

Languages

License

ModelTC/LightLLM

Folders and files

Latest commit

History

Repository files navigation

Pre$^3$: Enabling Deterministic Pushdown Automata for Faster Structured LLM Generation

News

Overview

Quick Start

Requirements

Launching Server

Inference

TODO

Acknowledgments

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Uh oh!

Contributors 46

Languages

Packages