You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Recently, we read a paper where the vLLM team proposed a method called SmartSpec.
We believe that the research, which dynamically adjusts the speculation length in a commercialized LLM serving system, is superior in terms of practicality compared to existing dynamic speculative length studies.
This idea could be applied to the current vLLM speculative decoding with Batch Expansion enabled, and it might also be applicable to future versions of vLLM with Batch Expansion disabled.
(I am curious whether the SmartSpec research was conducted on vLLM with Batch Expansion enabled. 🤔)
I wonder if the SmartSpec method will be implemented into the main repository in the near future.
Alternatives
No response
Additional context
No response
The text was updated successfully, but these errors were encountered:
Yes, we implemented SmartSpec on top of vllm with batch expansion in a forked version. We will integrate SmartSpec to vllm very soon. The first step is to remove batch expansion (#5691). In the meantime, we also need the community effort to improve speculative decoding performance (#4630) and implement tree-style speculative decoding(#4978).
SmartSpec (#4565) is very lightweight and can be implemented quickly. After all above mentioned steps, we should see similar performance as described in the paper.
🚀 The feature, motivation and pitch
Recently, we read a paper where the vLLM team proposed a method called SmartSpec.
We believe that the research, which dynamically adjusts the speculation length in a commercialized LLM serving system, is superior in terms of practicality compared to existing dynamic speculative length studies.
This idea could be applied to the current vLLM speculative decoding with Batch Expansion enabled, and it might also be applicable to future versions of vLLM with Batch Expansion disabled.
(I am curious whether the SmartSpec research was conducted on vLLM with Batch Expansion enabled. 🤔)
I wonder if the SmartSpec method will be implemented into the main repository in the near future.
Alternatives
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: