Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[MXNET-359] fix checks on convolution parameters in MKLDNN. #10666

Merged
merged 4 commits into from
May 2, 2018

Conversation

zheng-da
Copy link
Contributor

@zheng-da zheng-da commented Apr 24, 2018

Description

As I explained in #10663, there is mismatch between MXNet and ONNX. This is a temp fix from the MKLDNN side for the problem: MKLDNN conv follows the behavior of MXNet conv (always uses the first two elements in the tuple as padding).

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

@zheng-da zheng-da changed the title fix checks on convolution parameters in MKLDNN. [MXNET-359] fix checks on convolution parameters in MKLDNN. Apr 24, 2018
Copy link
Member

@anirudh2290 anirudh2290 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test case for symmetric padding with length of tuple as 4 ?

@@ -88,26 +91,23 @@ static mkldnn::convolution_backward_data::primitive_desc GetConvBwdData(
auto weight_md = GetWeightDesc(weights, param.num_group);
auto out_md = GetMemDesc(output);
auto engine = CpuEngine::Get()->get_engine();
CHECK_GE(param.stride.ndim(), 2U);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be CHECK_EQ ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after ONNX is fixed, this should be CHECK_EQ. I didn't know if ONNIX would be fixed when I submitted the PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please disregard my comment. I think this change shouldnt depend on whether onnx is fixed or not. CHECK_GE looks good to make it consistent with existing behavior.

@@ -123,16 +123,15 @@ static mkldnn::convolution_backward_weights::primitive_desc GetConvBwdWeights(
auto weight_md = GetWeightDesc(weights, param.num_group);
auto out_md = GetMemDesc(output);
auto engine = CpuEngine::Get()->get_engine();
CHECK_GE(param.stride.ndim(), 2U);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be CHECK_EQ ?

strides[0] = param.stride[0];
strides[1] = param.stride[1];
} else if (param.stride.ndim() == 1) {
strides[0] = param.stride[0];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

param.pad.ndim() == 1 will not use mkldnn anymore ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mxnet always assume 2 elements in the tuple. in the python, if the input is one element, it'll convert it to 2-element tuple, so in practice, we don't get stride with one element.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python will extend one element to two-element tuple. What about other frontend languages or what about someone calling c APIs to build their model?

@@ -32,6 +32,12 @@
namespace mxnet {
namespace op {

bool SupportMKLDNNConv(const DeconvolutionParam& params, const NDArray &input) {
if (params.kernel.ndim() != 2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to add check for strides and dilate too ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should have a check in the parameter parser of mxnet conv, so we don't need to check it in the MKLDNN code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are checking strides and dilates ndim to be greater than equal to 2, can we fallback to default implementation and return false here when ndim of stride, pad or dilates is less than 2 ?

@zheng-da
Copy link
Contributor Author

@piiswrong @eric-haibin-lin @ashokei @pengzhao-intel @TaoLv
Could you help review the code and merge it, so it goes to the release?

@eric-haibin-lin
Copy link
Member

Do we still want this change if onnx correctly handles padding?

@anirudh2290
Copy link
Member

@eric-haibin-lin yes we still need this change to make the behavior consistent (with or without MKLDNN) enabled. We can add this back later on, when we add support to raise error for MXNet conv (without MKLDNN enabled).

@zheng-da
Copy link
Contributor Author

zheng-da commented Apr 25, 2018 via email

@anirudh2290
Copy link
Member

anirudh2290 commented Apr 25, 2018

I ran the following simple script with the code pulled from your branch:

import mxnet as mx
arr = mx.nd.random.uniform(shape=(10, 10, 32, 32))
weight1 = mx.nd.random.uniform(shape=(10, 10, 3, 3))
arr = mx.nd.Convolution(data=arr, weight=weight1, no_bias=True, kernel=(3, 3), stride=(1), num_filter=10)
print(arr.asnumpy())

MXNet conv seems to be reading the stride with ndim = 1 correctly here: https://github.com/apache/incubator-mxnet/blob/master/src/operator/nn/convolution-inl.h#L400.
There is an inconsistency here between MXNet conv (with or without mkldnn enabled). With MKLDNN enabled the script throws an exception, without MKLDNN enabled the script doesn't throw an exception.

To avoid this inconsistency for now, we can fallback to default compute if any of pad , stride or dilate have ndim < 2. Let me know what you think.

@zheng-da
Copy link
Contributor Author

import mxnet as mx
arr = mx.nd.random.uniform(shape=(10, 10, 32, 32))
weight1 = mx.nd.random.uniform(shape=(10, 10, 3, 3))
arr1 = mx.nd.Convolution(data=arr, weight=weight1, no_bias=True, kernel=(3, 3), stride=(1), num_filter=10)
arr2 = mx.nd.Convolution(data=arr, weight=weight1, no_bias=True, kernel=(3, 3), stride=(1, 1), num_filter=10)
print((arr1 == arr2).asnumpy().sum())

This outputs 2616.0, while we expect 3000 because the output shape is (10L, 10L, 30L, 1L).

@@ -363,6 +365,9 @@ static void ConvolutionParamParser(nnvm::NodeAttrs* attrs) {
if (param_.dilate.ndim() == 0) param_.dilate = Shape3(1, 1, 1);
if (param_.pad.ndim() == 0) param_.pad = Shape3(0, 0, 0);
}
CHECK_EQ(param_.kernel.ndim(), param_.stride.ndim());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These checks needs to have error messages.
"stride must have the same number of dimensions with kernel_size, but kernel_size is set to (x,x,x) while stride is (x,x)"

@piiswrong piiswrong merged commit 1420697 into apache:master May 2, 2018
marcoabreu added a commit that referenced this pull request May 3, 2018
anirudh2290 pushed a commit to anirudh2290/mxnet that referenced this pull request May 7, 2018
…0666)

* fix check on tuples of conv.

* check params in (de)conv.

* rename.

* add messages.
jinhuang415 pushed a commit to jinhuang415/incubator-mxnet that referenced this pull request May 29, 2018
…0666)

* fix check on tuples of conv.

* check params in (de)conv.

* rename.

* add messages.
rahul003 pushed a commit to rahul003/mxnet that referenced this pull request Jun 4, 2018
…0666)

* fix check on tuples of conv.

* check params in (de)conv.

* rename.

* add messages.
zheng-da added a commit to zheng-da/incubator-mxnet that referenced this pull request Jun 28, 2018
…0666)

* fix check on tuples of conv.

* check params in (de)conv.

* rename.

* add messages.
@zheng-da zheng-da deleted the fix_conv branch July 5, 2018 06:20
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants