Skip to content

Speculative sampling #675

Closed
Closed
@andriyanthon

Description

@andriyanthon

llama.cpp added a feature for speculative inference:
ggml-org/llama.cpp#2926
but when running llama_cpp.server, it says it does not recognize the new parameters.

There are two new parameters:

  1. -md (model_draft) - the path to the draft model.
  2. -draft (n_draft) - how many tokens to draft each time

Can this new feature please be supported?

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions