Skip to content

Conversation

@csy0225
Copy link
Contributor

@csy0225 csy0225 commented Nov 30, 2023

PR types

Bug fixes

PR changes

Others

Description

修复 multi_encoder 在开启变长优化下的结果错误问题。

  • 修复问题:
    • 问题一:开启变长时,embedding kernel 会将输出 tensor shape 变成 {1, total_index_len, embedding_dim},其中 total_index_len 表示开启 lod 变长时,mask 后所有 batch 需要计算 token的总长度。但是在 multi-encoder kernel 中没有将上述 shape 还原成{batch_size, max_seq_len_value, embedding_dim}。
    • 问题二:multi-encoder 在 fp32 场景下计算时会 cast 到 fp16 类型进行计算。kernel 内部使用 fp16 tensor 进行计算,但是fp16 输出 tensor 实际shape 是 {1, total_index_len, embedding_dim},在分配内存时,需要分配 {batch_size, max_seq_len_value, embedding_dim} 大小。否则,multi-encoder 计算完成后,输出的 fp16 tensor -> fp32 tensor 过程中调用 castv2 kernel 会踩内存,引起 kernel 结果错误。

@csy0225 csy0225 force-pushed the fix_multi_encoder_adaptive_seqlen branch from 15ba712 to 78d6106 Compare November 30, 2023 11:36
@csy0225 csy0225 force-pushed the fix_multi_encoder_adaptive_seqlen branch from 78d6106 to 760209f Compare December 1, 2023 02:32
Copy link
Contributor

@zhupengyang zhupengyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zhupengyang zhupengyang merged commit e3caa7c into PaddlePaddle:develop Dec 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants