Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add quantization parameter for lmi_dist rolling batch backend for HF #888

Merged
merged 7 commits into from
Jul 6, 2023

Conversation

maaquib
Copy link
Contributor

@maaquib maaquib commented Jun 30, 2023

Description

Enables setting quantize from properties for the lmi_dist rolling batch backend

  • If this change is a backward incompatible change, why must this change be made?
  • Interesting edge cases to note here

@maaquib maaquib requested a review from sindhuvahinis June 30, 2023 20:51
@maaquib maaquib marked this pull request as ready for review June 30, 2023 20:51
@maaquib maaquib requested review from zachgk, frankfliu and a team as code owners June 30, 2023 20:51
@maaquib maaquib requested a review from xyang16 June 30, 2023 20:53
@lanking520
Copy link
Contributor

can you also add to FasterTransformer container for this bitsandbytes flag?

@maaquib
Copy link
Contributor Author

maaquib commented Jul 3, 2023

can you also add to FasterTransformer container for this bitsandbytes flag?

@lanking520 Done

lanking520
lanking520 previously approved these changes Jul 3, 2023
sharded=sharded,
quantize=None,
trust_remote_code=kwargs.get("trust_remote_code"))
quantize = self.properties.get("quantize", None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the original properties, we have option.load_in_8bit, can we reuse this param?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

load_in_8bit is a boolean. Assuming we add gptq support in next release we need a parameter which can take the quantization algo name instead of just a boolean. @lanking520 thoughts?

@lanking520 lanking520 dismissed their stale review July 3, 2023 18:06

pending discussion

@lanking520 lanking520 merged commit c737451 into deepjavalibrary:master Jul 6, 2023
@maaquib maaquib deleted the quantize branch July 7, 2023 16:11
KexinFeng pushed a commit to KexinFeng/djl-serving-forked that referenced this pull request Aug 16, 2023
deepjavalibrary#888)

* Set quantization param from properties file

* Format python

* Set quantize if dtype==int8

* Address review comments

* Adding BITSANDBYTES_NOWELCOME flag to fastertransformer

* Add  back
KexinFeng pushed a commit to KexinFeng/djl-serving-forked that referenced this pull request Aug 16, 2023
deepjavalibrary#888)

* Set quantization param from properties file

* Format python

* Set quantize if dtype==int8

* Address review comments

* Adding BITSANDBYTES_NOWELCOME flag to fastertransformer

* Add  back
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants