-
-
Notifications
You must be signed in to change notification settings - Fork 9k
Closed as not planned
Labels
feature requestNew feature or requestNew feature or requeststaleOver 90 days of inactivityOver 90 days of inactivity
Description
Prompt lookup decoding (PLD) is a variant of speculative decoding that replaces the draft model with a prefix lookup in the current sequence, resulting in a 2-4x throughput boost for input-grounded tasks like summarization and code modification.
Because PLD doesn't require a secondary model, it might be easier to implement in VLLM?
See https://github.com/apoorvumang/prompt-lookup-decoding for details.
RonanKMcGovern, anttttti, v-dicicco, Andrew-MAQ, Erland366 and 9 moregodsakurapeng, donglixp, bratao, Kaiyang-Chen, taprosoft and 1 more
Metadata
Metadata
Assignees
Labels
feature requestNew feature or requestNew feature or requeststaleOver 90 days of inactivityOver 90 days of inactivity