Optimize conv performance #3477

liujuncheng · 2020-08-13T06:56:39Z

优化Conv FP16 性能

将pseudo_half而不是true_half作为默认值
根据输入输出的dtype决定是否启用CUDNN_TENSOR_OP_MATH，而不是根据compute_type

在2080ti上测试resnet50，单卡batch_size为64，吞吐率如下

	优化前	优化后
FP32+NCHW	284	280
FP16+NCHW	443	505
FP16+NHWC	336	615

leaves-zwx · 2020-08-13T08:08:14Z

这里的优化效果的来源是 conv2d 的计算从 true_half 变为 pseudo_half 吗？

leaves-zwx · 2020-08-13T08:10:50Z

oneflow/core/device/cudnn_conv_util.h

-  CudnnConvDesc(const DataType& data_type, const ShapeView& in_blob_shape,
-                const user_op::UserOpConfWrapper& conv_conf);
+  CudnnConvDesc(const DataType compute_type, const DataType data_type,
+                const ShapeView& in_blob_shape, const PbMessage& conv_conf);


这个接受 const PbMessage& conv_conf 参数的构造函数随着旧的 conv op 代码的删除已经没地方使用了，是否可以删除？
也可以另起一个 pr 删除

leaves-zwx · 2020-08-13T08:12:43Z

oneflow/core/device/cudnn_conv_util.cpp

@@ -200,7 +201,7 @@ CudnnConvArgs::CudnnConvArgs(const PbMessage& conv_conf, DataType x_data_type,
    : xdesc(x_data_type, x_shape, data_format),
      ydesc(y_data_type, y_shape, data_format),
      wdesc(w_data_type, w_shape, data_format),
-      cdesc(GetConvDescDataType(x_data_type, enable_pseudo_half), x_shape, conv_conf),
+      cdesc(GetConvDescDataType(x_data_type, enable_pseudo_half), x_data_type, x_shape, conv_conf),


同 CudnnConvDesc，这里接受 const PbMessage& conv_conf 为参数的 CudnnConvArgs 的构造函数好像也可以删除了。

同 CudnnConvDesc，这里接受 const PbMessage& conv_conf 为参数的 CudnnConvArgs 的构造函数好像也可以删除了。

这个PR属于fix，和重构不放在一起吧

好的，那等这个 pr merge 后，我再提一个新的 pr 把过期作废的代码删除一下。

liujuncheng · 2020-08-13T08:37:17Z

这里的优化效果的来源是 conv2d 的计算从 true_half 变为 pseudo_half 吗？

提到的两点都是

leaves-zwx · 2020-08-13T08:57:22Z

这里的优化效果的来源是 conv2d 的计算从 true_half 变为 pseudo_half 吗？

提到的两点都是

大概明白了，是之前的用法有误。cudnn conv math type 的设置是要根据输入输出的 data_type 来确定的，而不是根据 cudnnConvolutionDescriptor_t 这个结构里面的设置的 data_type 来确定的。

  if (GetCudnnDataType(data_type) == CUDNN_DATA_HALF) {
    OF_CUDNN_CHECK(cudnnSetConvolutionMathType(val_, CUDNN_TENSOR_OP_MATH));
  }

那 fp16 + nchw 和 fp16 + nhwc 这两种使用场景下，都是 pseudo_half 比 true_half 更快吗？那 true_half 的使用场景应该是什么呢？

Optimize conv performance

cdde172

guo-ran approved these changes Aug 13, 2020

View reviewed changes

leaves-zwx reviewed Aug 13, 2020

View reviewed changes

leaves-zwx approved these changes Aug 13, 2020

View reviewed changes

jackalcooper added this to the 0.1.9 milestone Aug 13, 2020

Merge branch 'master' into dev_optimize_half_conv

6678989

liujuncheng merged commit 5fc44b2 into master Aug 13, 2020

liujuncheng deleted the dev_optimize_half_conv branch August 13, 2020 10:13

jackalcooper added the enhancement label Aug 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize conv performance #3477

Optimize conv performance #3477

liujuncheng commented Aug 13, 2020

leaves-zwx commented Aug 13, 2020

leaves-zwx Aug 13, 2020

leaves-zwx Aug 13, 2020

liujuncheng Aug 13, 2020

leaves-zwx Aug 13, 2020

liujuncheng commented Aug 13, 2020

leaves-zwx commented Aug 13, 2020

Optimize conv performance #3477

Optimize conv performance #3477

Conversation

liujuncheng commented Aug 13, 2020

leaves-zwx commented Aug 13, 2020

leaves-zwx Aug 13, 2020

Choose a reason for hiding this comment

leaves-zwx Aug 13, 2020

Choose a reason for hiding this comment

liujuncheng Aug 13, 2020

Choose a reason for hiding this comment

leaves-zwx Aug 13, 2020

Choose a reason for hiding this comment

liujuncheng commented Aug 13, 2020

leaves-zwx commented Aug 13, 2020