Closed
Description
The model to consider.
https://huggingface.co/CohereForAI/c4ai-command-r7b-12-2024
The closest model vllm already supports.
Likely either the original Cohere (for. obvious reasons) or Gemma2 (as it also has a funky SWA architecture)
What's your difficulty of supporting the model you want?
It uses SWA, but this can likely be ditched to get MVP inference working ala how gemma 2 was done
For some reason every 4th layer uses global attention without positional embeddings? Not sure how or why that one works tbh
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.