Open
Description
The current system does not support 4-bit training and inference. However, given that it could be feasibly implemented with relative ease, I am willing to assist in integrating this feature.
load_in_4bit = True
load_in_8bit = True if not load_in_4bit else False
bnb_config = BitsAndBytesConfig(
load_in_4bit=load_in_4bit,
load_in_8bit=load_in_8bit,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type=“nf4”,
bnb_4bit_compute_dtype=torch.float16,
)
model = LlamaForCausalLM.from_pretrained(
base_model, quantization_config=bnb_config, torch_dtype=torch.float16, device_map=device_map
)
Metadata
Metadata
Assignees
Labels
No labels