Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fused softmax kernel #3496

Merged
merged 2 commits into from
Aug 20, 2020
Merged

Fused softmax kernel #3496

merged 2 commits into from
Aug 20, 2020

Conversation

liujuncheng
Copy link
Collaborator

No description provided.

@yuanms2
Copy link
Contributor

yuanms2 commented Aug 19, 2020

俊丞,这个优化起多大作用。什么场景会起作用。好像有的softmax和cross entropy loss 一起算,反向会得到简化

@liujuncheng
Copy link
Collaborator Author

俊丞,这个优化起多大作用。什么场景会起作用。好像有的softmax和cross entropy loss 一起算,反向会得到简化

https://github.com/Oneflow-Inc/OneFlow-Benchmark/blob/master/LanguageModeling/BERT/bert.py#L230 ,这是针对bert的优化,bert每一层会有一个softmax。在2080ti上测试kernel执行时间,单位us

Fw FP32 Bw FP32 Fw FP16 Bw FP16
old 1325 1094 1110 789
new 676 412 590 355

在2080ti上测试bert base大概有1~2%吞吐率的提升,不过 @ShawnXuan 在V100测试优化效果没有2080ti明显,还需要进一步测试,不确定是不是V100更快的问题

@jackalcooper jackalcooper added this to the 0.1.9 milestone Aug 20, 2020
@liujuncheng liujuncheng merged commit 2ec8fc6 into master Aug 20, 2020
@liujuncheng liujuncheng deleted the dev_fused_softmax_kernel branch August 20, 2020 04:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants