Skip to content

4-bit quantization and QLoRA #486

Open
@jeff52415

Description

@jeff52415

The current system does not support 4-bit training and inference. However, given that it could be feasibly implemented with relative ease, I am willing to assist in integrating this feature.

load_in_4bit = True
load_in_8bit = True if not load_in_4bit else False 
bnb_config = BitsAndBytesConfig(
    load_in_4bit=load_in_4bit,
    load_in_8bit=load_in_8bit,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type=nf4”,
    bnb_4bit_compute_dtype=torch.float16,
    )
model = LlamaForCausalLM.from_pretrained(
    base_model, quantization_config=bnb_config, torch_dtype=torch.float16, device_map=device_map
    )

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions