Speculative Decoding / MTP #12130
Unanswered
davidsyoung
asked this question in
Q&A
Replies: 1 comment
-
Llama.cpp already has a form of speculative decoding implemented based on trees but I believe it's not in the main example program, I wonder if it would be possible for llama.cpp to do multi token prediction next, as well as more advanced forms of speculative decoding |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Wanted to ask a question here as I don’t fully understand the implementation details, but is something like speculative decoding or multi token prediction in Deepseek MoE models possible with the way llama.cpp does inference?
It would be an incredible feature to have, but I’m not even sure if it’s realistic.
Thank you!
Beta Was this translation helpful? Give feedback.
All reactions