-
Notifications
You must be signed in to change notification settings - Fork 30k
Open
Labels
Description
System Info
Linux
transformers==4.52.4
bitsandbytes==0.46.1
Who can help?
No response
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
m = "microsoft/phi-4"
bnb_config = BitsAndBytesConfig(
load_in_4bit=True
)
tokenizer = AutoTokenizer.from_pretrained(m)
model = AutoModelForCausalLM.from_pretrained(m, quantization_config=bnb_config, device_map='auto')
tokenizer.pad_token_id = tokenizer.eos_token_id
inputs = tokenizer.apply_chat_template(prompt, return_tensors="pt").to('cuda')
out = model.generate(inputs, max_new_tokens=50, synced_gpus=True)
Run with:
torchrun --nproc-per-node=2 script.py
Works perfectly fine with a single GPU setup, but produces assertion error when running on multiple GPUs
The error can be traced back to model.generate() function
error: Assertion error, python3.10/site-packages/bitsandbytes/nn/modules.py in fix_4bit_weight_quant_state_from_module
assert module.weight.shape[1] == 1
Expected behavior
Expect the model to execute generation without error