Add quantization config to HFTextGenerator #28

dlustre · 2024-07-13T08:58:25Z

This PR adds a quantization_config option to HFTextGenerator to allow for 8bit or 4bit quantization, the latter not having been possible previously.

In addition, the previous argument setup will be deprecated in the future:

The load_in_4bit and load_in_8bit arguments are deprecated and will be removed in the future versions. Please, pass a BitsAndBytesConfig object in quantization_config argument instead.

The changes are in accordance with the HuggingFace docs: https://github.com/huggingface/transformers/blob/main/docs/source/en/quantization/bitsandbytes.md

Now you can quantize a model by passing a BitsAndBytesConfig to [~PreTrainedModel.from_pretrained] method. This works for any model in any modality, as long as it supports loading with Accelerate and contains torch.nn.Linear layers.

Quantizing a model in 8-bit halves the memory-usage, and for large models, set device_map="auto" to efficiently use the GPUs available:
from transformers import AutoModelForCausalLM, BitsAndBytesConfig

quantization_config = BitsAndBytesConfig(load_in_8bit=True)

model_8bit = AutoModelForCausalLM.from_pretrained(
    "bigscience/bloom-1b7", 
    quantization_config=quantization_config
)

victordibia

Thanks for the addition

Add quantization config to HFTextGenerator

097f582

victordibia approved these changes Aug 8, 2024

View reviewed changes

victordibia merged commit a1d0891 into victordibia:main Aug 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add quantization config to HFTextGenerator #28

Add quantization config to HFTextGenerator #28

dlustre commented Jul 13, 2024

victordibia left a comment

Add quantization config to HFTextGenerator #28

Add quantization config to HFTextGenerator #28

Conversation

dlustre commented Jul 13, 2024

victordibia left a comment

Choose a reason for hiding this comment