Closed
Description
The model to consider.
Mamba Codestral: https://huggingface.co/mistralai/mamba-codestral-7B-v0.1
Highlights:
- SOTA 7B code model
- theoretically unlimited context length; tested up to 256k
- inference is linear-complexity with respect to sequence length, compared to transformers which is quadratic-complexity
The closest model vllm already supports.
Jamba seems to be the closest model, since it is Mamba-based: https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/models/jamba.py
What's your difficulty of supporting the model you want?
Mamba is a non-transformer architecture, but there is already a mamba-based model supported, so it's unclear how difficult it would be to support.