BatchNorm numerical instability

I just tracked down an issue with unexpected BatchNorm behavior in the current caffe version. The effect can be condensed to the following scenario: for an input batch with a channel with constant values, it produces (depending on the magnitude of the values) quite different results from zero. The expected normalization would be zeros in this channel for all values (mean normalized to zeros, no variance).

To reproduce the effect, I created a test with a 1x1x3x3 input blob where all entries are 100 (also works for smaller values). In this case, the computed mean is 100.000015 and the result of the normalization is ~ -0.00483, which is far away from 0. The normalization factor increases the numerical deviation. This just highlights how numerically badly conditioned the operation is and how important it is to get the mean right (the variance is then computed as E((X-E(X))^2) ). One approach to improve could be the stable online mean and variance computation by [Knuth](https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Online_algorithm) applied in regions with a reduction. 

Interestingly, chainer and mxnet do not seem to suffer from this effect. Chainer uses a reduction kernel for the mean computation [here](https://github.com/pfnet/chainer/blob/457550fc2501f3b8f73f93d69400f51d4096696f/cupy/core/core.pyx#L2299) and mxnet like [this](https://github.com/dmlc/mxnet/blob/master/src/operator/batch_norm-inl.h#L102). Probably the new cudnn versions handle this similarly.

This raises two issues:
1. This should be improved in general and,
2. it might lead to inconsistencies in whether the cudnn implementation or the caffe implementation is used.

For reference:
- [CuDNN v4/5 PR](https://github.com/BVLC/caffe/pull/3919)
- [CuDNN BatchNorm difference issue](https://github.com/NVIDIA/DIGITS/issues/629)
- [Original BatchNorm PR](https://github.com/BVLC/caffe/pull/3229)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

BatchNorm numerical instability #3963

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

BatchNorm numerical instability #3963

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions