Skip to content

Conversation

SanftMonster
Copy link
Contributor

Support speculative decoding with only v5.2 and cuda.

Though I tested the example and it could run well, there's only 1.5B v5.2 model yet. Therefore we need to further test it after having other scale models. However the modifications doesn't break the current functionalities of cuda (not sure of ncnn). I'd like to suggest to review it rather than wait for the model release.

TODO:

  • python support
  • verification with different scale models

std::vector<Tensor> _embd_weights;

private:
public:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused that I can't pass the compilation if keeping it private. Could you please help to take a look of it? @daquexian

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant