Skip to content

BatchNorm numerical instability #3963

Description

@classner

I just tracked down an issue with unexpected BatchNorm behavior in the current caffe version. The effect can be condensed to the following scenario: for an input batch with a channel with constant values, it produces (depending on the magnitude of the values) quite different results from zero. The expected normalization would be zeros in this channel for all values (mean normalized to zeros, no variance).

To reproduce the effect, I created a test with a 1x1x3x3 input blob where all entries are 100 (also works for smaller values). In this case, the computed mean is 100.000015 and the result of the normalization is ~ -0.00483, which is far away from 0. The normalization factor increases the numerical deviation. This just highlights how numerically badly conditioned the operation is and how important it is to get the mean right (the variance is then computed as E((X-E(X))^2) ). One approach to improve could be the stable online mean and variance computation by Knuth applied in regions with a reduction.

Interestingly, chainer and mxnet do not seem to suffer from this effect. Chainer uses a reduction kernel for the mean computation here and mxnet like this. Probably the new cudnn versions handle this similarly.

This raises two issues:

  1. This should be improved in general and,
  2. it might lead to inconsistencies in whether the cudnn implementation or the caffe implementation is used.

For reference:

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions