You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Chameleon can perform inference with multiple images as input, where images either belong to the same prompt or different prompts (in batched inference). Here is how you can do it:
90
90
91
91
```python
92
-
from transformers import ChameleonProcessor, ChameleonForCausalLM
92
+
from transformers import ChameleonProcessor, ChameleonForConditionalGeneration
The model can be loaded in 8 or 4 bits, greatly reducing the memory requirements while maintaining the performance of the original model. First make sure to install bitsandbytes, `pip install bitsandbytes` and make sure to have access to a CUDA compatible GPU device. Simply change the snippet above with:
130
130
131
131
```python
132
-
from transformers importChameleonForCausalLM, BitsAndBytesConfig
132
+
from transformers importChameleonForConditionalGeneration, BitsAndBytesConfig
model =ChameleonForCausalLM.from_pretrained("meta-chameleon", quantization_config=quantization_config, device_map="auto")
141
+
model =ChameleonForConditionalGeneration.from_pretrained("meta-chameleon", quantization_config=quantization_config, device_map="auto")
142
142
```
143
143
144
144
### Use Flash-Attention 2 and SDPA to further speed-up generation
145
145
146
146
The models supports both, Flash-Attention 2 and PyTorch's [`torch.nn.functional.scaled_dot_product_attention`](https://pytorch.org/docs/master/generated/torch.nn.functional.scaled_dot_product_attention.html) which can be enables for optimization. SDPA is the default options when you load the model, If you want to switch for Flash Attention 2, first make sure to install flash-attn. Refer to the [original repository](https://github.com/Dao-AILab/flash-attention) regarding that package installation. Simply change the snippet above with:
147
147
148
148
```python
149
-
from transformers importChameleonForCausalLM
149
+
from transformers importChameleonForConditionalGeneration
150
150
151
-
model =ChameleonForCausalLM.from_pretrained(
151
+
model =ChameleonForConditionalGeneration.from_pretrained(
152
152
model_id,
153
153
torch_dtype=torch.float16,
154
154
low_cpu_mem_usage=True,
@@ -183,7 +183,7 @@ model = ChameleonForCausalLM.from_pretrained(
>>> prompt = "I used to know a lot about constellations when I was younger, but as I grew older, I forgot most of what I knew. These are the only two constellations that I really remember now.<image><image>I would like for you to tell me about 3 more constellations and give me a little bit of history about the constellation."
0 commit comments