Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support 5.2 bloom #7846

Merged
merged 8 commits into from
Jan 17, 2024
Merged

Support 5.2 bloom #7846

merged 8 commits into from
Jan 17, 2024

Conversation

zhoutianzi666
Copy link
Contributor

@zhoutianzi666 zhoutianzi666 commented Jan 15, 2024

PR types

New features:支持bloom类模型的5.2 推理

PR changes

Description

  • 动态图运行代码

python3.8 predictor.py --model_name_or_path /root/.paddlenlp/models/bigscience/bloom-7b1/ --dtype float16 --src_length 102 --max_length 1024 --block_attn --batch_size 2 --inference_model > dynamic_2.txt

  • 动态图运行weight only int8代码

python3.8 predictor.py --model_name_or_path /root/.paddlenlp/models/bigscience/bloom-7b1/ --dtype float16 --src_length 102 --max_length 1024 --block_attn --batch_size 2 --inference_model --quant_type weight_only_int8

  • 动转静命令和静态图推理命令

python3.8 export_model.py --model_name_or_path /root/.paddlenlp/models/bigscience/bloom-7b1/ --inference_model --output_path ./inference --dtype float16 --block_attn --quant_type weight_only_int8

python3.8 predictor.py --model_name_or_path ./inference --inference_model --dtype "float16" --mode "static" --batch_size 2 --block_attn

  • wint8 动转静和静态图推理命令

python3.8 export_model.py --model_name_or_path /root/.paddlenlp/models/bigscience/bloom-7b1/ --inference_model --output_path ./inference_wint8 --dtype float16 --block_attn --quant_type weight_only_int8

python3.8 predictor.py --model_name_or_path ./inference_wint8 --inference_model --dtype "float16" --batch_size 2 --mode "static" --quant_type weight_only_int8 --block_attn

Copy link

paddle-bot bot commented Jan 15, 2024

Thanks for your contribution!

Copy link

codecov bot commented Jan 15, 2024

Codecov Report

Attention: 83 lines in your changes are missing coverage. Please review.

Comparison is base (4069f22) 56.95% compared to head (1dafb45) 56.90%.
Report is 9 commits behind head on develop.

Files Patch % Lines
...dlenlp/experimental/transformers/bloom/modeling.py 0.00% 78 Missing ⚠️
...enlp/experimental/transformers/generation_utils.py 0.00% 5 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #7846      +/-   ##
===========================================
- Coverage    56.95%   56.90%   -0.05%     
===========================================
  Files          587      587              
  Lines        88628    88724      +96     
===========================================
+ Hits         50480    50492      +12     
- Misses       38148    38232      +84     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@wj-Mcat wj-Mcat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR 质量很高,除了一个小 comment,我就两点点:

  • 把单测给加上,仿照 llama inference 5.2 在 tests/llm/test_predictor.py 里面给加上。
  • 上面的脚本你看看要不要更新到 llm/docs/inference.md 文件里面去。

llm/predictor.py Outdated
)
if predictor_args.block_attn:
from paddlenlp.experimental.transformers import (
BlommForCausalBlockLMInferenceModel as Model,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

你这边就仿照 llama block_attn 的写法呗,就直接 as BloomInferenceModel

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

你这边就仿照 llama block_attn 的写法呗,就直接 as BloomInferenceModel

已经done,感谢review

llm/predictor.py Outdated
config.max_seq_len = predictor_args.total_max_length
else:
from paddlenlp.experimental.transformers import (
BloomForCausalLMInferenceModel as Model,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上。

已经done,感谢review

@zhoutianzi666
Copy link
Contributor Author

PR 质量很高,除了一个小 comment,我就两点点:

  • 把单测给加上,仿照 llama inference 5.2 在 tests/llm/test_predictor.py 里面给加上。
  • 上面的脚本你看看要不要更新到 llm/docs/inference.md 文件里面去。

单测已经加上了。这个命令我觉得可以不放进去,因为和里面已有的命令是类似的。感谢review

Copy link
Contributor

@wj-Mcat wj-Mcat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wawltor wawltor merged commit cf907fc into PaddlePaddle:develop Jan 17, 2024
8 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants