Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cuDNN implementation for bias_add #3489

Merged
merged 2 commits into from
Aug 18, 2020
Merged

Conversation

liujuncheng
Copy link
Collaborator

Benchmarks of bert base on RTX 2080ti, batch size is 64.

Old + FP32

                    2.41%  1.28582s      7600  169.19us  1.9520us  816.07us  void oneflow::_GLOBAL__N__61_tmpxft_00004736_00000000_9_bias_add_kernel_compute_75_cpp1_ii_ba840438::InplaceBiasAddGpu<float, int>(int, oneflow::_GLOBAL__N__61_tmpxft_00004736_00000000_9_bias_add_kernel_compute_75_cpp1_ii_ba840438::InplaceBiasAddGpu<float, int>, oneflow::_GLOBAL__N__61_tmpxft_00004736_00000000_9_bias_add_kernel_compute_75_cpp1_ii_ba840438::InplaceBiasAddGpu<float, int>, float const *, int*)

Old + FP16

                    4.84%  946.84ms      7300  129.70us  1.9840us  398.59us  void oneflow::_GLOBAL__N__61_tmpxft_00004736_00000000_9_bias_add_kernel_compute_75_cpp1_ii_ba840438::BiasAddGpuHalf<int>(int, oneflow::_GLOBAL__N__61_tmpxft_00004736_00000000_9_bias_add_kernel_compute_75_cpp1_ii_ba840438::BiasAddGpuHalf<int>, oneflow::_GLOBAL__N__61_tmpxft_00004736_00000000_9_bias_add_kernel_compute_75_cpp1_ii_ba840438::BiasAddGpuHalf<int>, __half const *, __half const , int*)
                    0.47%  91.486ms       300  304.95us  1.8240us  847.07us  void oneflow::_GLOBAL__N__61_tmpxft_00004736_00000000_9_bias_add_kernel_compute_75_cpp1_ii_ba840438::InplaceBiasAddGpu<float, int>(int, oneflow::_GLOBAL__N__61_tmpxft_00004736_00000000_9_bias_add_kernel_compute_75_cpp1_ii_ba840438::InplaceBiasAddGpu<float, int>, oneflow::_GLOBAL__N__61_tmpxft_00004736_00000000_9_bias_add_kernel_compute_75_cpp1_ii_ba840438::InplaceBiasAddGpu<float, int>, float const *, int*)

New + FP32

                    2.04%  1.07918s      7600  142.00us  2.3360us  581.38us  void op_generic_tensor_kernel<int=2, float, float, float, int=256, cudnnGenericOp_t=0, cudnnNanPropagation_t=0, int=0>(cudnnTensorStruct, float*, cudnnTensorStruct, float const *, cudnnTensorStruct, float const *, float, float, float, float, reducedDivisorArray, int)

New + FP16

                    3.61%  672.64ms      7300  92.142us  2.2400us  290.21us  void op_generic_tensor_kernel<int=2, __half, float, __half, int=256, cudnnGenericOp_t=0, cudnnNanPropagation_t=0, int=0>(cudnnTensorStruct, __half*, cudnnTensorStruct, __half const *, cudnnTensorStruct, __half const *, float, float, float, float, reducedDivisorArray, int)
                    0.36%  68.010ms       300  226.70us  2.2400us  582.69us  void op_generic_tensor_kernel<int=2, float, float, float, int=256, cudnnGenericOp_t=0, cudnnNanPropagation_t=0, int=0>(cudnnTensorStruct, float*, cudnnTensorStruct, float const *, cudnnTensorStruct, float const *, float, float, float, float, reducedDivisorArray, int)

@yuanms2
Copy link
Contributor

yuanms2 commented Aug 16, 2020

这个改动会影响bert和cnn吗? 会让二者更快吗?

@liujuncheng
Copy link
Collaborator Author

这个改动会影响bert和cnn吗? 会让二者更快吗?

针对bert尤其是fp16进行的优化

@liujuncheng
Copy link
Collaborator Author

这个改动会影响bert和cnn吗? 会让二者更快吗?

对cnn基本没有影响

@jackalcooper jackalcooper added this to the 0.1.9 milestone Aug 17, 2020
@jackalcooper jackalcooper merged commit 0fb58c4 into master Aug 18, 2020
@jackalcooper jackalcooper deleted the dev_cudnn_bias_add branch August 18, 2020 04:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants