-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support 5.2 bloom #7846
Support 5.2 bloom #7846
Conversation
Thanks for your contribution! |
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## develop #7846 +/- ##
===========================================
- Coverage 56.95% 56.90% -0.05%
===========================================
Files 587 587
Lines 88628 88724 +96
===========================================
+ Hits 50480 50492 +12
- Misses 38148 38232 +84 ☔ View full report in Codecov by Sentry. |
99d897b
to
b60ea16
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR 质量很高,除了一个小 comment,我就两点点:
- 把单测给加上,仿照 llama inference 5.2 在 tests/llm/test_predictor.py 里面给加上。
- 上面的脚本你看看要不要更新到 llm/docs/inference.md 文件里面去。
llm/predictor.py
Outdated
) | ||
if predictor_args.block_attn: | ||
from paddlenlp.experimental.transformers import ( | ||
BlommForCausalBlockLMInferenceModel as Model, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
你这边就仿照 llama block_attn 的写法呗,就直接 as BloomInferenceModel
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
你这边就仿照 llama block_attn 的写法呗,就直接
as BloomInferenceModel
?
已经done,感谢review
llm/predictor.py
Outdated
config.max_seq_len = predictor_args.total_max_length | ||
else: | ||
from paddlenlp.experimental.transformers import ( | ||
BloomForCausalLMInferenceModel as Model, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同上。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同上。
已经done,感谢review
单测已经加上了。这个命令我觉得可以不放进去,因为和里面已有的命令是类似的。感谢review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR types
New features:支持bloom类模型的5.2 推理
PR changes
Description
python3.8 predictor.py --model_name_or_path /root/.paddlenlp/models/bigscience/bloom-7b1/ --dtype float16 --src_length 102 --max_length 1024 --block_attn --batch_size 2 --inference_model > dynamic_2.txt
python3.8 predictor.py --model_name_or_path /root/.paddlenlp/models/bigscience/bloom-7b1/ --dtype float16 --src_length 102 --max_length 1024 --block_attn --batch_size 2 --inference_model --quant_type weight_only_int8
python3.8 export_model.py --model_name_or_path /root/.paddlenlp/models/bigscience/bloom-7b1/ --inference_model --output_path ./inference --dtype float16 --block_attn --quant_type weight_only_int8
python3.8 predictor.py --model_name_or_path ./inference --inference_model --dtype "float16" --mode "static" --batch_size 2 --block_attn
python3.8 export_model.py --model_name_or_path /root/.paddlenlp/models/bigscience/bloom-7b1/ --inference_model --output_path ./inference_wint8 --dtype float16 --block_attn --quant_type weight_only_int8
python3.8 predictor.py --model_name_or_path ./inference_wint8 --inference_model --dtype "float16" --batch_size 2 --mode "static" --quant_type weight_only_int8 --block_attn