-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transformer decoding support fuse qkv #1455
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fuse_qkv 如果多了对size的限制的话也加上分支来支持更多size吧
Done. Thanks. |
@@ -65,7 +65,8 @@ std::vector<paddle::Tensor> DecodingForward( | |||
const int64_t& max_len, | |||
const float& beam_search_diversity_rate, | |||
const bool& rel_len, | |||
const float& alpha) { | |||
const float& alpha, | |||
const bool& fuse_qkv) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
不另外加这个fuse_qkv的attr了吧,直接根据size来判断吧,也保证对之前模型的兼容性
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Thanks.
param_type = item.split(".")[-1] | ||
|
||
model_dict[ | ||
"decoding.slf_q_" + param_type + "_" + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
load和export是否也判断下self._fuse_qkv
呢
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
目前不会报 warning 也不影响使用。
已优化代码。Thanks.
float alpha); | ||
const std::string& decoding_strategy, | ||
const int& beam_size, | ||
const int& topk, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
理论上int值的参数,是不需要加const和引用的。因为没有任何加速意义。
string的话通过引用,是会将copy从string降低到指针(32bit)的拷贝。
但是int呢,你不管加不加引用,拷贝的指针地址和拷贝int值是一样的。
正因为如此,所以你加不加const都一样了。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Truth. 这里加 const 单纯是不需要修改的形参都习惯性加上 const,加上引用是想让形参列表看起来统一。我新的修改删除了引用,不过对于这样不需要修改的形参,还是建议保留 const。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR types
Performance optimization
PR changes
Models
Description
Transformer decoding support fuse qkv. Performance comes later.