Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: prompt lookup decoding #1802

Open
kevinhu opened this issue Nov 27, 2023 · 2 comments
Open

Feature request: prompt lookup decoding #1802

kevinhu opened this issue Nov 27, 2023 · 2 comments

Comments

@kevinhu
Copy link

kevinhu commented Nov 27, 2023

Prompt lookup decoding (PLD) is a variant of speculative decoding that replaces the draft model with a prefix lookup in the current sequence, resulting in a 2-4x throughput boost for input-grounded tasks like summarization and code modification.

Because PLD doesn't require a secondary model, it might be easier to implement in VLLM?

See https://github.com/apoorvumang/prompt-lookup-decoding for details.

@cadedaniel
Copy link
Collaborator

#2188 introduces a framework for verifying proposal tokens. Once it's merged then PLD is not very difficult to add.

@trinhdoduyhungss
Copy link

Hello @cadedaniel , thank you and the vLLM team for creating a great library. I did see that vLLM supported speculative decoding but I couldn't find any documentation on how to use this feature, nor prompt-lookup-decoding. Can you give me an example of how to use this feature in a simple way?

#2188 introduces a framework for verifying proposal tokens. Once it's merged then PLD is not very difficult to add.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants