-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature]: Support for a draft model that takes inputs from base model (to support Medusa/EAGLE/Hydra) #4669
Comments
I have implemented Medusa using this. If this makes sense and can be accepted as a contribution, I would love to create a PR (including the implementation of Medusa). I am also working on implementing the EAGLE approach. |
@abhigoyal1997 This is indeed an important feature that people have been looking for. It's also within my exploration radar, and I look forward to its implementation in vllm. I'm asking this because I'm developing this tree-style speculation. And it will be a perfect match with the Medusa/Eagle/Hydra here. We can maybe combine the effort and see how the performance boost when the two technieques are put together. #4565 (comment) |
Hi @KexinFeng As for the current implementation of Medusa and EAGLE using a single sequence, I'll create a PR as soon as I've tested it a bit more and have company approvals. |
cc @cadedaniel @LiuXiaoxuanPKU for visibility. |
I'm excited that you're working on this. I'm also considering adding tree attention to vllm to adapt it to EAGLE. How is your work going now? I'd like to ask where I should make efforts to modify it, and do you have an open source plan? |
🚀 The feature, motivation and pitch
In approaches like Medusa/EAGLE/Hydra, the speculative model uses the last hidden states from the base model to propose candidates. This feature will allow any such approaches to be implemented with ease. One idea is to store the required base model's outputs along with the sequence and then use that while generating candidates for the next iteration.
Alternatives
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: