-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Conversation
src/operator/nn/pool.cuh
Outdated
const TShape& stride, const int pool_type, OpReqType req_type, | ||
DType* out_data) { | ||
using namespace mxnet_op; | ||
if (kernel.ndim() == 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a little to much copy pasting.
how about change unpool_sum_?d_gpu_kernel to unpool_sum_gpu_kernel(..., Shape<?>)
and add int ndim to pool's template argument
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually ignore the previous comment. This is fine
src/operator/nn/pool.cuh
Outdated
sum += out_slice[h * width + w]; | ||
} | ||
} | ||
KERNEL_ASSIGN(out_data[index], req_type, sum / pool_size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
forward doesn't need req_type support for now. Its almost always writeto.
add a check in pool and remove this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
Please submit another PR to move conv and pooling to src/operator/nn/ after this |
c86014d
to
baaaad4
Compare
Any idea why 2d is that much slower than 1d? |
For GPU 1/2-D kernels, one possible reason might be this line and the following "atomicAdd" function call. The "if" condition would cause thread divergence in the 2D cases and consequently, more runtime than 1D. The unpool_sum has no such condition check for 2/3 d kernels, and the relative difference between 1D and 2D is not that big. But if we could have a mask as an invisible output, the unpool_max should be faster and have no atomicAdd calls. |
e53eac9
to
2431700
Compare
still failing. not sure if its mkl bug or the new pooling op |
I'm debugging to find the root cause. |
@glingyan @zhenlinluo Looks like there is a bug with mkldnn pooling op when pad > 0 |
will looking it today |
We will disable mkl for pad > 0 and merge this first. please re-enable when you submit a fix |
@piiswrong how do you do the test, unit test? |
consistency tests agains cudnn |
@glingyan You can see the test log here. Just run test_pooling_versions() in test_operator_gpu.py. |
@piiswrong @reminisce Do we have any idea to improve also the deconvolution? |
@piiswrong @reminisce @sxjscience deconvolution could be implment with mkl convolution api , |
@reminisce @piiswrong will try to fix pooling when your patch is merged |
I find the current deconvolution does not support 3D Deconv (https://github.com/torch/nn/blob/master/doc/convolution.md#nn.VolumetricFullConvolution), which has been used for video generation. |
@sxjscience I think we can do similar things as we did for convolution by adopting Caffe's algorithm. The key functions im2col and col2im have been implemented for convolution. We can use them for deconv as well. |
@reminisce Yes, if that's not in the plan I can ask Leo @leezu to help do the job |
@piiswrong What do you think about @sxjscience 's proposal? If it has higher priority, I can do it. |
6f5d037
to
c65cbdf
Compare
@reminisce Someone should do it... Shouldn't be too hard |
@sxjscience If you have resources to work on deconvolution, you can lead it. Otherwise, I will spare time on this in the next a couple of weeks. BTW: The deconvolution implemented is actually a transposed convolution. Tensorflow has renamed it to conv2d_transpose. Should we take the chance to make the name correct? |
I think we can add an alias to the operator. |
this problem is because when param.pooling_convention == pool_enum::kFull, the mkl's algorithm is the same with CUDNN_POOLING_AVERAGE_COUNT_EXCLUDE_PADDING so the correct way to by pass is |
@reminisce We decide to first add the CUDNN version of 3D Deconv. |
@glingyan Thanks for finding the root cause. Could you submit a PR to fix it? |
@glingyan to be clear, you mean its a feature unsupported by mkldnn and need to be bypassed? |
@piiswrong, current mkl2017 algorithm for pool_enum::kFull is designed for EXCLUDE_PADDING, if change to INCLUDE_PADDING will have segment fault |
@sxjscience I have some time in the following weeks. I plant to port Caffe's algorithm of deconv to MXNet as a low priority task, which means no deadline promise yet. I can merge it with your cuDNN 3D deconv work after I finish the implementation. Let me know if there are any conflicts. Thanks. |
@reminisce @sxjscience Please try to reuse code instead of copy pasting from conv for deconv |
@@ -85,6 +85,10 @@ DMLC_REGISTER_PARAMETER(PoolingParam); | |||
MXNET_REGISTER_OP_PROPERTY(Pooling, PoolingProp) | |||
.describe(R"code(Perform pooling on the input. | |||
|
|||
The shapes for 1-D pooling are | |||
- **data**: *(batch_size, channel, width)*, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a blank line before -
, otherwise sphnix fails to recognize it as a list, see http://mxnet.io/api/python/symbol.html#mxnet.symbol.Pooling
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also remove the content: 1-D pooling is special case of 2-D pooling with width=1 and kernel[1]=1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will change it in another PR. Thanks for pointing it out.
@piiswrong @reminisce an initial cudnn 3D deconv implementation is in #5615. Currently only the algoreg is reused, but if you have further suggestions we can refactor the code. |
This pull request re-implemented pooling operator to support 1/2/3-D pooling operations without using expression template to increase performance.
Pooling: This implementation.
PoolingV1: Currently existing operator in MXNet.
CuDNNPooling: cuDNN pooling operator.
Table 1: max pooling benchmark results. New implementation’s 1D max pooling is compared with the equivalent 2D case.
2D: data=(10, 3, 100000, 1), kernel=(64, 1)
Table 2: max pooling 2D benchmark results.
Table 3: max pooling 3D benchmark results.
Table 4: avg pooling benchmark results. New implementation’s 1D avg pooling is compared with the equivalent 2D case.
2D: data=(10, 3, 100000, 1), kernel=(64, 1)
Table 5: avg pooling 2D benchmark results.
Table 6: avg pooling 3D benchmark results.