Closed as not planned
Description
It is already mentioned by @WoosukKwon here: #249 (comment) that the samplers are not optimized and are a part of the vLLM roadmap. It will be great if we take a leap forward to incorporate speculative decoding into vLLM.
Speculative Decoding is the latest paper from DeepMind researchers and they claim to achieve 2-2.5x decoding speedup. There is already an open-source contribution of the technique here: https://github.com/shreyansh26/Speculative-Sampling
Metadata
Metadata
Assignees
Labels
No labels