You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Prompt lookup decoding (PLD) is a variant of speculative decoding that replaces the draft model with a prefix lookup in the current sequence, resulting in a 2-4x throughput boost for input-grounded tasks like summarization and code modification.
Because PLD doesn't require a secondary model, it might be easier to implement in VLLM?
Hello @cadedaniel , thank you and the vLLM team for creating a great library. I did see that vLLM supported speculative decoding but I couldn't find any documentation on how to use this feature, nor prompt-lookup-decoding. Can you give me an example of how to use this feature in a simple way?
#2188 introduces a framework for verifying proposal tokens. Once it's merged then PLD is not very difficult to add.
Prompt lookup decoding (PLD) is a variant of speculative decoding that replaces the draft model with a prefix lookup in the current sequence, resulting in a 2-4x throughput boost for input-grounded tasks like summarization and code modification.
Because PLD doesn't require a secondary model, it might be easier to implement in VLLM?
See https://github.com/apoorvumang/prompt-lookup-decoding for details.
The text was updated successfully, but these errors were encountered: