Closed
Description
🚀 The feature, motivation and pitch
Hi,
I tried to use unsloth/QVQ-72B-Preview-bnb-4bit but received the message
Model Qwen2VLForConditionalGeneration does not support BitsAndBytes quantization yet.
model= LLM(model= unsloth/QVQ-72B-Preview-bnb-4bit,
gpu_memory_utilization=0.95,
max_num_seqs=32,
tensor_parallel_size=torch.cuda.device_count(),
quantization="bitsandbytes",
load_format="bitsandbytes",
dtype=torch.bfloat16,
enforce_eager=True,
max_model_len=4096,
)
Alternatives
No response
Additional context
No response
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.