Closed
Description
llama.cpp added a feature for speculative inference:
ggml-org/llama.cpp#2926
but when running llama_cpp.server, it says it does not recognize the new parameters.
There are two new parameters:
- -md (model_draft) - the path to the draft model.
- -draft (n_draft) - how many tokens to draft each time
Can this new feature please be supported?