Integrate Speculative decoding to speed up inferences

It is already mentioned by @WoosukKwon here: https://github.com/vllm-project/vllm/issues/249#issuecomment-1607908452 that the samplers are not optimized and are a part of the [vLLM roadmap](https://github.com/vllm-project/vllm/issues/244). It will be great if we take a leap forward to incorporate speculative decoding into vLLM.

Speculative Decoding is the latest [paper](https://arxiv.org/abs/2211.17192) from DeepMind researchers and they claim to achieve 2-2.5x decoding speedup. There is already an open-source contribution of the technique here: https://github.com/shreyansh26/Speculative-Sampling 



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Integrate Speculative decoding to speed up inferences #942

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Integrate Speculative decoding to speed up inferences #942

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions