-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add fuse_attention_ffn support for qwen #8526
Conversation
Thanks for your contribution! |
5139462
to
18b5946
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #8526 +/- ##
===========================================
- Coverage 53.87% 53.86% -0.01%
===========================================
Files 620 620
Lines 97081 97101 +20
===========================================
+ Hits 52299 52308 +9
- Misses 44782 44793 +11 ☔ View full report in Codecov by Sentry. |
self.w1 = nn.Linear(config.hidden_size, ff_dim_in, bias_attr=not config.no_bias) | ||
self.w2 = nn.Linear(config.hidden_size, ff_dim_in, bias_attr=not config.no_bias) | ||
if self.fuse_attention_ffn: | ||
self.gate_up_fused_proj = nn.Linear(config.hidden_size, ff_dim_in * 2, bias_attr=not config.no_bias) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PaddleNLP/paddlenlp/transformers/qwen/modeling.py
Lines 474 to 496 in 95c1780
def get_tensor_parallel_split_mappings(num_hidden_layers): | |
final_actions = {} | |
base_actions = { | |
# Column Linear | |
"lm_head.weight": partial(fn, is_column=True), | |
"qwen.h.0.mlp.w2.weight": partial(fn, is_column=True), | |
"qwen.h.0.mlp.w1.weight": partial(fn, is_column=True), | |
"qwen.h.0.attn.c_attn.weight": partial(fn, is_column=True, is_naive_3fuse=True), | |
"qwen.h.0.attn.c_attn.bias": partial(fn, is_column=True, is_naive_3fuse=True), | |
# Row Linear | |
"qwen.wte.weight": partial(fn, is_column=False), | |
"qwen.h.0.mlp.c_proj.weight": partial(fn, is_column=False), | |
"qwen.h.0.attn.c_proj.weight": partial(fn, is_column=False), | |
} | |
for key, action in base_actions.items(): | |
if "h.0." in key: | |
for i in range(num_hidden_layers): | |
final_actions[key.replace("h.0.", f"h.{i}.")] = action | |
final_actions[key] = action | |
return final_actions | |
mappings = get_tensor_parallel_split_mappings(config.num_hidden_layers) |
你要适配一下切分规则,tensor parallel 对 gate_up_fused_proj 的切分规则
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
好哒,后续还有一个支持sp的pr,在下一个pr一起补充一下。
PR types
Performance optimization
PR changes
Models
Description
千问模型增加fuse_attention_ffn的支持