-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add quantization parameter for lmi_dist
rolling batch backend for HF
#888
Conversation
can you also add to FasterTransformer container for this bitsandbytes flag? |
@lanking520 Done |
sharded=sharded, | ||
quantize=None, | ||
trust_remote_code=kwargs.get("trust_remote_code")) | ||
quantize = self.properties.get("quantize", None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the original properties, we have option.load_in_8bit, can we reuse this param?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
load_in_8bit
is a boolean. Assuming we add gptq
support in next release we need a parameter which can take the quantization algo name instead of just a boolean. @lanking520 thoughts?
deepjavalibrary#888) * Set quantization param from properties file * Format python * Set quantize if dtype==int8 * Address review comments * Adding BITSANDBYTES_NOWELCOME flag to fastertransformer * Add back
deepjavalibrary#888) * Set quantization param from properties file * Format python * Set quantize if dtype==int8 * Address review comments * Adding BITSANDBYTES_NOWELCOME flag to fastertransformer * Add back
Description
Enables setting
quantize
from properties for the lmi_dist rolling batch backend