Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize softmax cuda kernel #4058

Merged
merged 3 commits into from
Dec 30, 2020
Merged

Conversation

liujuncheng
Copy link
Collaborator

@liujuncheng liujuncheng commented Dec 29, 2020

测试数据 rows = 49152 dtype = half

Softmax

cols size(GB) t warp(us) t smem(us) T uncached(us) T cudnn(us) bw warp bw smem bw uncached bw cudnn
32 0.00585938 18 103 1230 406 325.5208 56.88714 4.76372 14.43196
64 0.01171875 22 103 1210 413 532.6705 113.7743 9.684917 28.3747
128 0.0234375 32 112 1210 427 732.4219 209.2634 19.36983 54.88876
256 0.046875 62 133 1240 454 756.0484 352.4436 37.80242 103.2489
512 0.09375 122 201 1260 506 768.4426 466.4179 74.40476 185.2767
1024 0.1875 244 269 1360 613 768.4426 697.026 137.8676 305.8728
2048 0.375   484 1660 828   774.7934 225.9036 452.8986
4096 0.75   964 2520 1350   778.0083 297.619 555.5556
8192 1.5   2230 2920 4210   672.6457 513.6986 356.2945
16384 3     4720 10690     635.5932 280.6361
32768 6     11950 20940     502.0921 286.533

SoftmaxGrad

cols size(GB) t warp(us) t smem(us) T uncached(us) T cudnn(us) bw warp bw smem bw uncached bw cudnn
32 0.00878906 16 62 737 215 549.3164 141.7591 11.92546 40.87936
64 0.01757813 22 62 729 223 799.0057 283.5181 24.11265 78.82567
128 0.03515625 44 71 737 241 799.0057 495.1585 47.70183 145.8766
256 0.0703125 88 94 753 279 799.0057 748.0053 93.37649 252.0161
512 0.140625 176 177 784 309 799.0057 794.4915 179.3686 455.0971
1024 0.28125 353 355 832 422 796.7422 792.2535 338.0409 666.4692
2048 0.5625   709 965 776   793.3709 582.9016 724.8711
4096 1.125   1420 1610 1880   792.2535 698.7578 598.4043
8192 2.25     2870 4530     783.9721 496.6887
16384 4.5     7620 9280     590.5512 484.9138
32768 9     18500 18600     486.4865 483.871

@oneflow-ci-bot oneflow-ci-bot removed their request for review December 29, 2020 15:50
@oneflow-ci-bot oneflow-ci-bot removed their request for review December 30, 2020 12:49
@liujuncheng liujuncheng merged commit df43e70 into master Dec 30, 2020
@liujuncheng liujuncheng deleted the dev_optimize_softmax_cuda_kernel branch December 30, 2020 12:49
liujuncheng added a commit that referenced this pull request Jun 3, 2021
* Optimize softmax cuda kernel

* refine

Former-commit-id: df43e70
@VioletEvergardenYYH
Copy link

it's too juan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants