Feature request: prompt lookup decoding

Prompt lookup decoding (PLD) is a variant of speculative decoding that replaces the draft model with a prefix lookup in the current sequence, resulting in a 2-4x throughput boost for input-grounded tasks like summarization and code modification.

Because PLD doesn't require a secondary model, it might be easier to implement in VLLM?

See https://github.com/apoorvumang/prompt-lookup-decoding for details.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Feature request: prompt lookup decoding #1802

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Feature request: prompt lookup decoding #1802

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions