SPRINTER

This is the offcial implementation of the paper: Speeding up Speculative Decoding via Approximate Verification.

We propose SPRINTER, which utilizes a low-complexity verifier trained to predict if tokens generated from a draft LLM would be accepted by the target LLM. By performing approximate sequential verification, SPRINTER does not require verification by the target LLM and is only invoked when a token is deemed unacceptable. This leads to reducing the number of calls to the larger LLM and can achieve further speedups.

Implementation Steps:

Use dataset.py to prepare the dataset for training a verifier (need to login huggingface using user sepecific token)
Train a verifier using train.py
Run SPRINTER.py to implement speculative decoding via approximate verification.
Evaluate the quality of SPRINTER using win_rate_eval.py and using rouge_eval.py

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
dataset.py		dataset.py
rouge_eval.py		rouge_eval.py
sprinter.py		sprinter.py
train.py		train.py
utils.py		utils.py
win_rate_eval.py		win_rate_eval.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SPRINTER

About

Uh oh!

Releases

Packages

Languages

MeiyuZhong/SPRINTER

Folders and files

Latest commit

History

Repository files navigation

SPRINTER

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages