Skip to content
This repository was archived by the owner on Nov 17, 2023. It is now read-only.

Batch Norm rewrite without mshadow, 1D, 2D, 3D, float16, float32, float64 as well as operator gtest framework #5936

Merged
merged 151 commits into from
May 16, 2017

Conversation

cjolivier01
Copy link
Member

@cjolivier01 cjolivier01 commented Apr 21, 2017

Note that batch_norm.cu and batch_norm-inl.h are almost entirely new code.
However, github is not rendering the diff by default. Click "View" so see them.
test_op.h (all new code along with test_util.h and test_perf.h) also is not shown for the same reason.
@piiswrong

Performance (bs=128, c=3, h=28, w=28):

OLD CPU
BatchNormV1Prop 2D: Timing [Forward] 2828.16 ms, avg: 5.65631 ms X 500 passes
BatchNormV1Prop 2D: Timing [Backward] 20908.4 ms, avg: 41.8169 ms X 500 passes

NEW CPU
BatchNormProp 2D: Timing [Forward] 788.777 ms, avg: 1.57755 ms X 500 passes
BatchNormProp 2D: Timing [Backward] 322.013 ms, avg: 0.644026 ms X 500 passes

OLD GPU
BatchNormV1Prop 2D: Timing [Forward] 5.365 ms, avg: 0.01073 ms X 500 passes
BatchNormV1Prop 2D: Timing [Backward] 15.483 ms, avg: 0.030966 ms X 500 passes

NEW GPU
BatchNormProp 2D: Timing [Forward] 3.514 ms, avg: 0.007028 ms X 500 passes
BatchNormProp 2D: Timing [Backward] 4.787 ms, avg: 0.009574 ms X 500 passes

@@ -184,7 +184,7 @@ class TBlob {
mshadow::Shape1(shape_.Size()), stream);
}
/*! \brief return number of dimension of the tensor inside */
inline int ndim(void) const {
inline index_t ndim(void) const {
return shape_.ndim();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cast instead of change return type.

Copy link
Member Author

@cjolivier01 cjolivier01 Apr 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason it is int? It can never be negative.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The size() calls return index_t, btw.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other codes may rely on the return type of ndim() and we should check more carefully before making the change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@@ -309,6 +309,34 @@ inline void ParamParser(nnvm::NodeAttrs* attrs) {
attrs->parsed = std::move(param);
}

/*! \brief Callback class to allow for convenient development and testing */
template<typename Type>
class Callbacker {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For unit testing callback into the unit testing framework. I can move it to just batchnorm if you like.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is a callback necessary? Can you do it in the test code instead?

Copy link
Member Author

@cjolivier01 cjolivier01 Apr 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason that there is a callback call is to inspect intermediate values or complex matrices (print them, for instance) without having to include the test_util.h (for example) in the main build tree. When not being tested, it has no overhead (production code doesn't make callbacks).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok let's move this to batchnorm, or better, remove this and only test public methods so that no callback is needed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@@ -0,0 +1,366 @@
/*!
Copy link
Member Author

@cjolivier01 cjolivier01 Apr 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

batch_norm_v1-inl.h is basically the old batch_norm-inl.h

@@ -0,0 +1,89 @@
/*!
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

batch_norm_v1.cc is basically the old batch_norm.cc

@piiswrong
Copy link
Contributor

piiswrong commented Apr 21, 2017

please also add python tests similar to conv and pool refactors
@reminisce Could you review?

@@ -0,0 +1,19 @@
/*!
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

batch_norm_v1.cu is basically the old batch_norm.cu

@@ -19,7 +19,7 @@ fi
cp make/config.mk .
echo "USE_CUDA=1" >> config.mk
echo "USE_CUDA_PATH=/usr/local/cuda" >> config.mk
echo "USE_CUDNN=1" >> config.mk
echo "USE_CUDNN=0" >> config.mk
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One the the jenkins scripts should build organic GPU operators rather than CUDNN always

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mkl test is doing that

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Copy link
Contributor

@reminisce reminisce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unit tests in python are required.

  1. Verify the new implementation's results are same as the old one's.
  2. Verify the new implementation's results of CPU and GPU are same.
  3. Verify the new implementation produces the expected results for the cases that were not supported before.

* \sa OpReqType, OpContext
*/
#ifdef MXNET_USE_CUDA
void doForward(mshadow::Stream<gpu> *stream,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function name use CamelCase style.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

const size_t itemCount = inputData.Size() / channelCount;

// Avoid multiple dptr() call within the channel loop
Dtype *inputDataPtr = inputData.dptr<Dtype>();
Copy link
Contributor

@reminisce reminisce Apr 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use DType instead of Dtype to conform with the convention in MXNet.

#define DeviceTensor3 DeviceTensor<Dtype, 3>

template <typename Dtype, typename accreal>
static void BatchNormalization_updateOutput(mshadow::Stream<gpu> *s,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid mixed styles of the function name.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

/*! \brief inverse standard deviation <-> variance
* Note that these aren't entirely reversible due to eps
**/
#define VARIANCE_TO_INVSTD(__var$, __eps$) (1.0/sqrt((__var$) + Dtype(__eps$)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious. Why need a '$' sign at the end of every variable?

#define INVSTD_TO_VARIANCE(__invstd$, __eps$) ((1.0 / ((__invstd$) * (__invstd$))) - (__eps$))

/*! \brief Compute the variance of each input channel, as well as update moving mean/variants */
template<typename DType, typename Shape>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why need Shape as a template argument?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works with both mshadow::TShape and nnvm::TShape equally without causing extra overhead

/*! \brief Batch normalization operator */
template<typename xpu, typename Dtype, typename AccType>
class BatchNormOp : public Operator
, public Callbacker<Operator> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid multiple inheritance?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is convenient because I can use dynamic_cast in the unit tests to check for existence of the interface. Otherwise, for other operators, a change would have to be made to add a function returning a pointer to the callback interface, etc.

class BatchNormOp : public Operator
, public Callbacker<Operator> {
typedef ::nnvm::TShape TShape;
typedef ::mxnet::TBlob TBlob;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think TShape and TBlob are visible in this scope, right? Why need these typedefs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they get mixed up sometimes with mshadow::TShape/TBlob. They are there because of compile issues.


/*! \brief Compute the mean of each input channel */
template<typename Shape>
static inline void computeMean(const Dtype *in_data,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems you implemented computing mean and variance without using gemm that has been optmized internally. According to your benchmark results, new CPU is slower than the old CPU. Any good reasons?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can we apply gemm here?
For mean: we create an array [1/size .... ]T and multiply with the original in_data?
Will gemm consume more space?
Is gemm doing better than openmp?

New to MXNet, thank you for your help.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yuruofeifei Yes. Caffe uses gemm/gemv to implement computing mean and variance. I guess it's faster than putting omp there manually. In addition, CPU and GPU can share the same piece of code if using gemm. Currently, this code is for CPU only. It requires a little more memory to store the vector, but gives faster speed (my guess) and easy code maintenance in return.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eric instructed me to not use blas such as caffe does for this, but to base it upon Torch.
In addition, gemm does superfluous multiplies when calculating mean.

const std::vector<TBlob> &aux_states);
#endif // MXNET_USE_CUDA

void doBackward(mshadow::Stream<cpu> *stream,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When MXNET_USE_CUDA=1, this function is compiled. But it shouldn't be, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is required. One can always select not to use a GPU even when the code is compiled with GPU enabled


/*! \brief Fast-foreach when you don't care about the position other than channel */
template<typename Shape, typename OnData>
static inline void forEachFast(Dtype *in_data, const Shape& shape,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a need for this function?
I think we can just pass the in_data array with correct channel offset.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is my understanding that data is arranged like this:

[batch item 1][channel 1][spatial data (d,r,w)][channel 2][spatial data (d,r,w)][batch item 2]
[batch item 2][channel 1][spatial data (d,r,w)][channel 2][spatial data (d,r,w)][batch item 2]

In which case you still need to loop across the batch item, correct?

const size_t channelCount = ishape[1];
CHECK(oshape.Size() == channelCount);

forEachFast(in_data, ishape,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about using std accumulate for mean and variance?
And using std transform for converting data back?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Same question as above
  2. Once the omp loop is inserted across channels, then there's not much left to insert into binary operation. In addition, this adds a lot of overhead in debug mode with function calls for what's a pretty simple calculation.
    wdyt?

.gitignore Outdated
ps-lite
nnvm
!src/nnvm
#dmlc-core
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

@cjolivier01 cjolivier01 May 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put those there accidentally in the caffe data iter commit. I am removing them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you removing this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, removed

@@ -11,6 +11,23 @@

#if MXNET_USE_CUDA

/*! \brief Macros/inlines to assist CLion to parse Cuda files (*.cu, *.cuh) */
#ifdef __JETBRAINS_IDE__
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This allows CLion IDE to parse cuda code. It has no net effect on anything else.

@piiswrong
Copy link
Contributor

one last change in tests: initialize gamma and beta to random (instead of ones and zeros) to make sure that fix_gamma is working the same way as before

@cjolivier01
Copy link
Member Author

Already checking varying gamma/beta per our offline discussion

}
}

DO_BIND_DISPATCH(CreateOp, param, (*in_type)[0]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pass in_shape to createop here and remove mkl_off

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@piiswrong piiswrong merged commit 8300fbe into apache:master May 16, 2017
saurabh3949 pushed a commit to saurabh3949/mxnet that referenced this pull request May 23, 2017
…at64 as well as operator gtest framework (apache#5936)

* Batch Norm rewrite without mshadow as well as operator gtest framework

* performance testing

* lint fixes

* use CUDNN for this test

* remove superfluous omp define

* Fix file names in comments

* build, run, clean gtest works (although a test is failing)

* CR comments

* Adjust timing tests for more strenuous sample

* Remove temp resource allocation

* DeviceTensor3 added, forEachFast not yet converted

* DeviceTensor3 version working

* DeviceTensor3 working

* .

* Fix for use_global_stats

* fixed bug with testing suite for double (Float64)

* python unit tests working for batchnorm

* python unit tests

* Update documentation for mxnet.initializer.Mixed (apache#5937)

* Update documentation for SVMOutput. (apache#5931)

* Update documentation for SVMOutput.

* Update doc for SVMOutput - fix formatting.

* Adding install instruction for Ubuntu-CPU-Python (apache#5885)

* edit ndarray API docs (apache#5806)

* edit docs in broadcast_reduce_op

* edit docs in broadcast_reduce_op

* minor change

* lint fix

* fix

* mx.nd.ones

* mx.nd.repeat

* mx.nd.reverse

* add example in repeat

* optimizer update

* fix nanprod

* fix optimizer_op api doc

* fix reduce_op api doc

* fix nd.ones api doc

* mx.nd.repeat doc change

* Update broadcast_reduce_op.h

* Symbol docs fixes (apache#5930)

* symbol docs minor formatting changes

* deepcopy, infer_shape, infer_shape_partial docs modified

* Few more small fixes

* arithmetic functions fixes

* some more modifications

* changes after review

* small change

* grad function note added

* More API Doc Edits (apache#5886)

* edit activation doc

* doc l2_normalization

* edit MakeLoss doc

* edit blockgrad doc

* blockgrad fileline fix

* edit MakeLoss doc cont.

* doc change 'tensor' to 'multidimensional array'

* l2normalization doc improve

* makeloss doc improve, blockgrad doc improve

* fix doc in activation, l2_normalization, make_loss

* fix minor grammar

* use .describe to avoid build failure.

* Update documentation for mxnet.image.imdecode (apache#5957)

* Update documentation for mxnet.image.imdecode

* Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library)

* Fix script by adding path to Dockerfile (apache#5958)

* Clean install script

* Add test for pip installations

* Remove debug statements & comments

* Make test runnable as script and from framework

* Fix path to Dockerfiles

* Putting failing cases at the end

* Update doc for Custom operator. (apache#5875)

* Update doc for Custom operator.

* Update doc for Custom operator.

* Fix formating in doc for Custom operator.

* Fix formating in doc for Custom operator.

* Minor change to ndarray.Custom documentation.

* Minor edit in doc for Custom operator.

* Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'.

* Minor formatting change for Custom operator documentation.

* For Custom operator doc, move example into ndarray_doc.py.

* Minor change in Custom operator documentation

* Improve the doc of pick + Update dmlc-core (apache#5946)

* Add PickParam to fix the docstring and the initial value for axis

* Update dmlc-core

* Update dmlc-core

* Image docs modified (apache#5973)

* imageIter doc modified

* edited imageiter

* ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (apache#5962)

* [KVStore] Add support for other data types (apache#5818)

* Fix kvstore type

* Fix lint

* Parse inputs to DataDesc

* Make module support dtype

* Fix lint

* Add default dtype in Comm

* Fix lint

* Revert rename

* [cpp-package] Add C++ basic tutorial and build instruction (apache#5971)

* Add C++ basic tutorial and build instruction

* Remove binaries

* Fix lint

* Avoid sign-compare

* Update documentation for mxnet.metric.np (apache#5977)

* Getting rid of identity (apache#5935)

* Activation ops (apache#5938)

* [Ops] Add op: 'relu'

* Add op: 'sigmoid'

* Introduce 'kernel_launch_op'

* Add tests and describe; move it to elemwise_unary_op

* Fix GPU version

* Convert caffe AbsVal to mx.symbol.abs in caffe converter (apache#5984)

* Correction to LSTMCell docstring (apache#5986)

* [Module] fix input_grads order (apache#5980)

* fix input_grads order + update dmlc-core

* set label to be optional

* update env_var doc (apache#5964)

* Adjusting make, Callback removed

* batch norm gpu testing

* Batch Norm rewrite without mshadow as well as operator gtest framework

* performance testing

* lint fixes

* use CUDNN for this test

* remove superfluous omp define

* Fix file names in comments

* build, run, clean gtest works (although a test is failing)

* CR comments

* Adjust timing tests for more strenuous sample

* Remove temp resource allocation

* rearrange source into cc and cu files

* lint fixes

* Trigger build

* Use latest mshadow

* temporarily revert channel position parameter field

* Add more tests for batchnorm

* Add more tests for batchnorm

* test_operator_gpu working for all types

* Compiles after AccReal

* Compiles after AccReal

* All tests working

* All tests working

* build, run, clean gtest works (although a test is failing)

* vc++ requires explicit int type for omp for loop

* Repair cpp-package

* signed/unsigned fixed in cuda file

* lint fixes in tests and cpp-package directories

* more lint

* use IsWriting() helper

* Fall-through for unsupported MKL shapes/types

* Fall-through for unsupported MKL shapes/types

* cleaner mkl_off approach

* Warning only whem MKL is requested

* Warning only whem MKL is requested

* lint

* ..

* python problem fixed

* python problem fixed

* Merge branch 'batchnorm' into batchnorm_pr

# Conflicts:
#	src/operator/batch_norm.cc
#	src/operator/batch_norm.cu
#	tests/cpp/operator/batchnorm_test.cc

* lint fix

* lint fix

* lint fix

* lint fix

* lint fix

* Fix visual c++ compile problem

* .

* .

* All unit tests pass again

* lint fix

* fix strange compile errors in CUDNN batchnorm header

* FInish using flags instead of bools

* lint

* Fix timing pass count for forward pass

* Fix R script install roxygen problem

* code formatting, addition of doc strings is causing IDE to add spaces before the calls

* removed commented

* cr comments

* Change back to compilable code

* For CPU mode, store as invstd

* move testing code around a little

* lint fix

* Use AccReal in some places to avoid fp16 problems

* Fix minor invstd problem in cuda version

* remove unused scale param

* add permutation unit test, handle cudnn doesn't like 3D

* .

* lint

* .

* Remove mkl_off

* lint fix and time cudnn when enabled
Guneet-Dhillon pushed a commit to Guneet-Dhillon/mxnet that referenced this pull request Sep 13, 2017
…at64 as well as operator gtest framework (apache#5936)

* Batch Norm rewrite without mshadow as well as operator gtest framework

* performance testing

* lint fixes

* use CUDNN for this test

* remove superfluous omp define

* Fix file names in comments

* build, run, clean gtest works (although a test is failing)

* CR comments

* Adjust timing tests for more strenuous sample

* Remove temp resource allocation

* DeviceTensor3 added, forEachFast not yet converted

* DeviceTensor3 version working

* DeviceTensor3 working

* .

* Fix for use_global_stats

* fixed bug with testing suite for double (Float64)

* python unit tests working for batchnorm

* python unit tests

* Update documentation for mxnet.initializer.Mixed (apache#5937)

* Update documentation for SVMOutput. (apache#5931)

* Update documentation for SVMOutput.

* Update doc for SVMOutput - fix formatting.

* Adding install instruction for Ubuntu-CPU-Python (apache#5885)

* edit ndarray API docs (apache#5806)

* edit docs in broadcast_reduce_op

* edit docs in broadcast_reduce_op

* minor change

* lint fix

* fix

* mx.nd.ones

* mx.nd.repeat

* mx.nd.reverse

* add example in repeat

* optimizer update

* fix nanprod

* fix optimizer_op api doc

* fix reduce_op api doc

* fix nd.ones api doc

* mx.nd.repeat doc change

* Update broadcast_reduce_op.h

* Symbol docs fixes (apache#5930)

* symbol docs minor formatting changes

* deepcopy, infer_shape, infer_shape_partial docs modified

* Few more small fixes

* arithmetic functions fixes

* some more modifications

* changes after review

* small change

* grad function note added

* More API Doc Edits (apache#5886)

* edit activation doc

* doc l2_normalization

* edit MakeLoss doc

* edit blockgrad doc

* blockgrad fileline fix

* edit MakeLoss doc cont.

* doc change 'tensor' to 'multidimensional array'

* l2normalization doc improve

* makeloss doc improve, blockgrad doc improve

* fix doc in activation, l2_normalization, make_loss

* fix minor grammar

* use .describe to avoid build failure.

* Update documentation for mxnet.image.imdecode (apache#5957)

* Update documentation for mxnet.image.imdecode

* Update documentation for mxnet.image.imdecode (clarify that we need OpenCV and not the CV2 Python library)

* Fix script by adding path to Dockerfile (apache#5958)

* Clean install script

* Add test for pip installations

* Remove debug statements & comments

* Make test runnable as script and from framework

* Fix path to Dockerfiles

* Putting failing cases at the end

* Update doc for Custom operator. (apache#5875)

* Update doc for Custom operator.

* Update doc for Custom operator.

* Fix formating in doc for Custom operator.

* Fix formating in doc for Custom operator.

* Minor change to ndarray.Custom documentation.

* Minor edit in doc for Custom operator.

* Minor change to doc for Custom operator. Data is 'NDArray-or-Symbol'.

* Minor formatting change for Custom operator documentation.

* For Custom operator doc, move example into ndarray_doc.py.

* Minor change in Custom operator documentation

* Improve the doc of pick + Update dmlc-core (apache#5946)

* Add PickParam to fix the docstring and the initial value for axis

* Update dmlc-core

* Update dmlc-core

* Image docs modified (apache#5973)

* imageIter doc modified

* edited imageiter

* ADD missing Libri_sample.json, FIX minor bugs in speech_recognition example (apache#5962)

* [KVStore] Add support for other data types (apache#5818)

* Fix kvstore type

* Fix lint

* Parse inputs to DataDesc

* Make module support dtype

* Fix lint

* Add default dtype in Comm

* Fix lint

* Revert rename

* [cpp-package] Add C++ basic tutorial and build instruction (apache#5971)

* Add C++ basic tutorial and build instruction

* Remove binaries

* Fix lint

* Avoid sign-compare

* Update documentation for mxnet.metric.np (apache#5977)

* Getting rid of identity (apache#5935)

* Activation ops (apache#5938)

* [Ops] Add op: 'relu'

* Add op: 'sigmoid'

* Introduce 'kernel_launch_op'

* Add tests and describe; move it to elemwise_unary_op

* Fix GPU version

* Convert caffe AbsVal to mx.symbol.abs in caffe converter (apache#5984)

* Correction to LSTMCell docstring (apache#5986)

* [Module] fix input_grads order (apache#5980)

* fix input_grads order + update dmlc-core

* set label to be optional

* update env_var doc (apache#5964)

* Adjusting make, Callback removed

* batch norm gpu testing

* Batch Norm rewrite without mshadow as well as operator gtest framework

* performance testing

* lint fixes

* use CUDNN for this test

* remove superfluous omp define

* Fix file names in comments

* build, run, clean gtest works (although a test is failing)

* CR comments

* Adjust timing tests for more strenuous sample

* Remove temp resource allocation

* rearrange source into cc and cu files

* lint fixes

* Trigger build

* Use latest mshadow

* temporarily revert channel position parameter field

* Add more tests for batchnorm

* Add more tests for batchnorm

* test_operator_gpu working for all types

* Compiles after AccReal

* Compiles after AccReal

* All tests working

* All tests working

* build, run, clean gtest works (although a test is failing)

* vc++ requires explicit int type for omp for loop

* Repair cpp-package

* signed/unsigned fixed in cuda file

* lint fixes in tests and cpp-package directories

* more lint

* use IsWriting() helper

* Fall-through for unsupported MKL shapes/types

* Fall-through for unsupported MKL shapes/types

* cleaner mkl_off approach

* Warning only whem MKL is requested

* Warning only whem MKL is requested

* lint

* ..

* python problem fixed

* python problem fixed

* Merge branch 'batchnorm' into batchnorm_pr

# Conflicts:
#	src/operator/batch_norm.cc
#	src/operator/batch_norm.cu
#	tests/cpp/operator/batchnorm_test.cc

* lint fix

* lint fix

* lint fix

* lint fix

* lint fix

* Fix visual c++ compile problem

* .

* .

* All unit tests pass again

* lint fix

* fix strange compile errors in CUDNN batchnorm header

* FInish using flags instead of bools

* lint

* Fix timing pass count for forward pass

* Fix R script install roxygen problem

* code formatting, addition of doc strings is causing IDE to add spaces before the calls

* removed commented

* cr comments

* Change back to compilable code

* For CPU mode, store as invstd

* move testing code around a little

* lint fix

* Use AccReal in some places to avoid fp16 problems

* Fix minor invstd problem in cuda version

* remove unused scale param

* add permutation unit test, handle cudnn doesn't like 3D

* .

* lint

* .

* Remove mkl_off

* lint fix and time cudnn when enabled
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.