Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add quantization config to HFTextGenerator #28

Merged
merged 1 commit into from
Aug 8, 2024

Conversation

dlustre
Copy link
Contributor

@dlustre dlustre commented Jul 13, 2024

This PR adds a quantization_config option to HFTextGenerator to allow for 8bit or 4bit quantization, the latter not having been possible previously.

In addition, the previous argument setup will be deprecated in the future:

The load_in_4bit and load_in_8bit arguments are deprecated and will be removed in the future versions. Please, pass a BitsAndBytesConfig object in quantization_config argument instead.

image

The changes are in accordance with the HuggingFace docs: https://github.com/huggingface/transformers/blob/main/docs/source/en/quantization/bitsandbytes.md

Now you can quantize a model by passing a BitsAndBytesConfig to [~PreTrainedModel.from_pretrained] method. This works for any model in any modality, as long as it supports loading with Accelerate and contains torch.nn.Linear layers.

Quantizing a model in 8-bit halves the memory-usage, and for large models, set device_map="auto" to efficiently use the GPUs available:

from transformers import AutoModelForCausalLM, BitsAndBytesConfig

quantization_config = BitsAndBytesConfig(load_in_8bit=True)

model_8bit = AutoModelForCausalLM.from_pretrained(
    "bigscience/bloom-1b7", 
    quantization_config=quantization_config
)

Copy link
Owner

@victordibia victordibia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the addition

@victordibia victordibia merged commit a1d0891 into victordibia:main Aug 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants