-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lookahead decoding | forking + "appending" to child sequences. #1970
Comments
I found the problem. This is fixed. I'll leave the issue open in case someone has thoughts on this approach. Now that the outputs are matching (with and without the interventions) I'll try to finish a draft and see if it works. |
Hi @priyamtejaswin do you have a draft already? I am also interested in taking this further (especially Lookahead + PagedAttention) |
I believe once #2188 is merged you can add Lookahead as the proposer, since verification of tokens is the same. |
lookahead decoding supports flash attention and sampling both now. |
Closing this as a duplicate of #1742. The work @cadedaniel mentioned has been completed and the discussion for this feature is more active in the issue I linked above. |
Hi, @WoosukKwon and @zhuohan123 ,
Fantastic project!
I was taking a stab at implementing a version of greedy lookahead-decoding. Given some candidate completions, I was trying to:
step
in the engine to parallelize the next token prediction across candidatesI had a question about the behavior of
Sequence.append_token_id
, and its implications on the future engine steps.vllm/vllm/sequence.py
Lines 159 to 167 in 24f60a5
From the looks of it, if I append a token here, it should add the token to the appropriate blocks.
But when I try this in practice, I get a different output. Suppose the LLM was generating
I intervene after it has generated
912
, and append442
using.append_token_id
, and then callstep()
. But I seeSeeding is not the problem -- I have accounted for that.
Tagging some folks who had previously participated in lookahead/speculative decoding discussions.
@simon-mo @LiuXiaoxuanPKU @skrider @beginlner
The text was updated successfully, but these errors were encountered: