Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Request for SmartSpec Method Support #5886

Closed
bong-furiosa opened this issue Jun 27, 2024 · 2 comments
Closed

[Feature]: Request for SmartSpec Method Support #5886

bong-furiosa opened this issue Jun 27, 2024 · 2 comments

Comments

@bong-furiosa
Copy link
Contributor

🚀 The feature, motivation and pitch

Recently, we read a paper where the vLLM team proposed a method called SmartSpec.
We believe that the research, which dynamically adjusts the speculation length in a commercialized LLM serving system, is superior in terms of practicality compared to existing dynamic speculative length studies.

This idea could be applied to the current vLLM speculative decoding with Batch Expansion enabled, and it might also be applicable to future versions of vLLM with Batch Expansion disabled.
(I am curious whether the SmartSpec research was conducted on vLLM with Batch Expansion enabled. 🤔)

I wonder if the SmartSpec method will be implemented into the main repository in the near future.

Alternatives

No response

Additional context

No response

@LiuXiaoxuanPKU
Copy link
Collaborator

Hi @bong-furiosa, thanks for the attention!

Yes, we implemented SmartSpec on top of vllm with batch expansion in a forked version. We will integrate SmartSpec to vllm very soon. The first step is to remove batch expansion (#5691). In the meantime, we also need the community effort to improve speculative decoding performance (#4630) and implement tree-style speculative decoding(#4978).
SmartSpec (#4565) is very lightweight and can be implemented quickly. After all above mentioned steps, we should see similar performance as described in the paper.

@bong-furiosa
Copy link
Contributor Author

Since we have received a detailed response, we will close this issue. We are very looking forward to seeing further developments in vLLM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants
@LiuXiaoxuanPKU @bong-furiosa and others