Closed
Description
🚀 The feature, motivation and pitch
Recently cohere released a CommandR7B model in huggingface and I would like to contribute the vllm implementation version of it. @simon-mo
PR: #11358
The model also uses the interleave attention like gemma2 and mistral, so kv cache optimization is needed. I saw it is also on the roadmap. #9464
Alternatives
No response
Additional context
I have integrated and tested it work with all the benchmark scripts and would like to add a feature branch for review.
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.