4-bit quantization and QLoRA

The current system does not support 4-bit training and inference. However, given that it could be feasibly implemented with relative ease, I am willing to assist in integrating this feature.
```python
load_in_4bit = True
load_in_8bit = True if not load_in_4bit else False 
bnb_config = BitsAndBytesConfig(
    load_in_4bit=load_in_4bit,
    load_in_8bit=load_in_8bit,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type=“nf4”,
    bnb_4bit_compute_dtype=torch.float16,
    )
model = LlamaForCausalLM.from_pretrained(
    base_model, quantization_config=bnb_config, torch_dtype=torch.float16, device_map=device_map
    )
``` 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

4-bit quantization and QLoRA #486

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

4-bit quantization and QLoRA #486

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions