-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Huggingface configurations refactoring #1283
Conversation
b59f114
to
953aee7
Compare
# device map is not required for lmi dist and | ||
if properties['rolling_batch'] == RollingBatchEnum.lmidist or \ | ||
properties['rolling_batch'] == RollingBatchEnum.vllm: | ||
return properties |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xyang16 @KexinFeng @lanking520
I am not validating whether device_map exists for lmi dist and vllm. And also not inserting load_in_8bit in kwargs. Kindly check if this is okay,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
device map is not important. this is only applies to scheduler and HF Accelerate path
engines/python/setup/djl_python/rolling_batch/lmi_dist_rolling_batch.py
Outdated
Show resolved
Hide resolved
bitsandbytes8 = 'bitsandbytes8' | ||
|
||
# supported by vllm | ||
awq = 'awq' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kindly verify whether does it cover all supported quantization methods. lmi-dist has tests for gptq. vllm does not have test case for awq in our pipeline. Let me add it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is no awq test for vLLM, feel free to add it
Tests:
Added unit test cases.
Tested CI test cases manually in ec2. And also ran in github actions
HF https://github.com/deepjavalibrary/djl-serving/actions/runs/6790943108/job/18461535915
HF lora https://github.com/deepjavalibrary/djl-serving/actions/runs/6791093901/job/18462088800
Rolling batch changes https://github.com/deepjavalibrary/djl-serving/actions/runs/6790720395/job/18460807680