Closed
Description
The model to consider.
Announcement blog: https://www.zyphra.com/post/zamba2-7b
Base model: https://huggingface.co/Zyphra/Zamba2-7B
Instruct tuned: https://huggingface.co/Zyphra/Zamba2-7B-Instruct
The closest model vllm already supports.
Jamba, as it is a mixture of state-space and transformers blocks
Zamba2-7B-Instruct is a hybrid model composed of state-space (Mamba2) and transformer blocks.
What's your difficulty of supporting the model you want?
Should be easy once Mamba2 support lands in #9292, however this use_shared_attention_lora
case seems possibly complex
All of the HF-compatible modeling code can be found here: https://github.com/Zyphra/transformers_zamba2/tree/main/src/transformers/models/zamba2
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.