[DCU] Llama a8w8 inference performance optimization #8800

Deleter-D · 2024-07-24T02:36:14Z

PR types

Performance optimization

PR changes

Models

Description

Optimize the inference performance in Llama a8w8 case.

On the DCU platform, performance of rocblas gemm under different transpositions is NT > NN > TN. Due to the matmul of paddle, NT cannot be triggered in this scenario, so a suboptimal solution is chosen, which is NN.

paddle-bot · 2024-07-24T02:36:19Z

Thanks for your contribution!

codecov · 2024-07-24T03:12:13Z

Codecov Report

Attention: Patch coverage is 0% with 31 lines in your changes missing coverage. Please review.

Project coverage is 55.58%. Comparing base (da1eb9c) to head (7d4fdc6).
Report is 219 commits behind head on develop.

Files with missing lines	Patch %	Lines
...erimental/transformers/fused_transformer_layers.py	0.00%	17 Missing ⚠️
...dlenlp/experimental/transformers/llama/modeling.py	0.00%	14 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #8800      +/-   ##
===========================================
+ Coverage    55.37%   55.58%   +0.21%     
===========================================
  Files          631      630       -1     
  Lines        99707    98382    -1325     
===========================================
- Hits         55211    54687     -524     
+ Misses       44496    43695     -801

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

YanhuiDua · 2024-07-24T06:05:13Z

paddlenlp/experimental/transformers/fused_transformer_layers.py

@@ -1384,7 +1392,10 @@ def compute_mmha(self, qkv_out, caches, attn_mask, seq_lens, rotary_embs, rotary
        )[0]

    def compute_out_linear(self, fmha_out, i):
-        out_linear_out = paddle.matmul(fmha_out, self.linear_weights[i], False, True)
+        if paddle.is_compiled_with_rocm():


把rocm需要不转置的理由在PR描述里说下吧

已添加说明

DesmonDay

LGTM

[DCU] Llama a8w8 inference performance optimization

7ae994d

YanhuiDua reviewed Jul 24, 2024

View reviewed changes

YanhuiDua approved these changes Jul 24, 2024

View reviewed changes

Deleter-D added 2 commits July 24, 2024 15:56

keep two cases

9fa876e

change back

7d4fdc6

DesmonDay self-requested a review July 24, 2024 08:13

DesmonDay approved these changes Jul 24, 2024

View reviewed changes

wawltor merged commit 105e0da into PaddlePaddle:develop Jul 24, 2024
9 of 12 checks passed

Deleter-D deleted the llama_dcu branch July 25, 2024 11:15

Deleter-D mentioned this pull request Jul 26, 2024

[DCU] fix llama inference bug on DCU #8815

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DCU] Llama a8w8 inference performance optimization #8800

[DCU] Llama a8w8 inference performance optimization #8800

Deleter-D commented Jul 24, 2024 •

edited

Loading

paddle-bot bot commented Jul 24, 2024

codecov bot commented Jul 24, 2024 •

edited

Loading

YanhuiDua Jul 24, 2024

Deleter-D Jul 24, 2024

DesmonDay left a comment

[DCU] Llama a8w8 inference performance optimization #8800

[DCU] Llama a8w8 inference performance optimization #8800

Conversation

Deleter-D commented Jul 24, 2024 • edited Loading

PR types

PR changes

Description

paddle-bot bot commented Jul 24, 2024

codecov bot commented Jul 24, 2024 • edited Loading

Codecov Report

YanhuiDua Jul 24, 2024

Choose a reason for hiding this comment

Deleter-D Jul 24, 2024

Choose a reason for hiding this comment

DesmonDay left a comment

Choose a reason for hiding this comment

Deleter-D commented Jul 24, 2024 •

edited

Loading

codecov bot commented Jul 24, 2024 •

edited

Loading