[New Model]: Chameleon support

### The model to consider.

https://huggingface.co/facebook/chameleon
(as of now, the models can be downloaded using the [model form](https://ai.meta.com/resources/models-and-libraries/chameleon-downloads/))

Chameleon is an interesting multimodal model architecture based on Llama 2. It adds image inputs and outputs to Llama 2 by tokenizing images using a VQ-VAE and adding the codebook to Llama's tokenizer vocabulary.
In principle, it supports text and images as input and output in arbitrary combination. However, the released models were finetuned to prevent image generation.


### The closest model vllm already supports.

LlamaForCausalLM

### What's your difficulty of supporting the model you want?

For text->text support, the implementation should actually be fairly easy. The model is based on Llama-2 with the following differences:
* QK norm
* reordering the norm similar to Swin Transformer (normalizing the outputs of the attention and ffn blocks instead of the inputs)

To enable image inputs, image tokenization using the provided VQ-VAE needs to be added.

Further info:
* Original paper: https://arxiv.org/abs/2405.09818
* Reference implementation: https://github.com/facebookresearch/chameleon

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[New Model]: Chameleon support #5721

The model to consider.

The closest model vllm already supports.

What's your difficulty of supporting the model you want?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[New Model]: Chameleon support #5721

Description

The model to consider.

The closest model vllm already supports.

What's your difficulty of supporting the model you want?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions