-
-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[New Model]: Add Cohere2 Model #11357
Comments
This has already been added in #11203, and is in the latest released version of vLLM. Thanks for the offer though! |
@DarkLight1337 i think there is some issues for the impl:
So i think maybe gemma2 has the same impl issue |
Thanks for the report! I'll ask @simon-mo to take a look since he added this. |
@DarkLight1337 A followup thought, for long context impl correctness, we could add a needle test to gate the corretness. I would be able to help also. |
actually i think @youkaichao added this |
🚀 The feature, motivation and pitch
Recently cohere released a CommandR7B model in huggingface and I would like to contribute the vllm implementation version of it. @simon-mo
PR: #11358
The model also uses the interleave attention like gemma2 and mistral, so kv cache optimization is needed. I saw it is also on the roadmap. #9464
Alternatives
No response
Additional context
I have integrated and tested it work with all the benchmark scripts and would like to add a feature branch for review.
Before submitting a new issue...
The text was updated successfully, but these errors were encountered: