Closed
Description
Major problem: float32's rounding error is so large that it may dominate the difference between the numerical gradients and the analytical gradients, which cases relatively large relative error in gradient checking. As a consensus, the gradient checker used in unit tests may be unreliable.
Potential solution:
- Choosing epsilon carefully to make rounding error reasonable. However, this is a challenging task. See https://en.wikipedia.org/wiki/Numerical_differentiation and the experiments in the end of this issue.
- Using float64 instead of float32. Reference: http://cs231n.github.io/neural-networks-3/
Experiments
The differences between the numerical and analytical gradients of the linear function f(x, y) = x^T * y are shown as bellow. We can conclude that
- Although linear function is very simple, the absolute error and relative error are unacceptable large if float32 is used.
- The errors are very small is float64 is used.
- If the scale of epsilon is comparable with x/y, errors will be small. But I'm not sure whether this conclusion generalizes to more complicated functions.
x_shape (1, 200) y_shape (200, 1)
<type 'numpy.float32'>
epsilon max diff max relative diff avg_abs_diff avg_abs_x avg_abs_y
10000.000000000000000 5.96046e-08 6.27301e-08 5.96046e-08 0.485359 0.532412
1000.000000000000000 0 0 0 0.485359 0.532412
100.000000000000000 5.96046e-08 6.27301e-08 5.96046e-08 0.485359 0.532412
10.000000000000000 2.98023e-07 3.13651e-07 2.98023e-07 0.485359 0.532412
1.000000000000000 1.19209e-07 1.2546e-07 1.19209e-07 0.485359 0.532412
0.100000000000000 7.7486e-06 8.15491e-06 7.7486e-06 0.485359 0.532412
0.010000000000000 6.49691e-05 6.83758e-05 6.49691e-05 0.485359 0.532412
0.050000000000000 2.68221e-05 2.82285e-05 2.68221e-05 0.485359 0.532412
0.001000000000000 0.00031656 0.00033316 0.00031656 0.485359 0.532412
0.000100000000000 0.0034982 0.00368163 0.0034982 0.485359 0.532412
0.000010000000000 0.194233 0.204418 0.194233 0.485359 0.532412
x_shape (1, 200) y_shape (200, 1)
<type 'numpy.float64'>
epsilon max diff max relative diff avg_abs_diff avg_abs_x avg_abs_y
10000.000000000000000 0 0 0 0.485358 0.532412
1000.000000000000000 0 0 0 0.485358 0.532412
100.000000000000000 0 0 0 0.485358 0.532412
10.000000000000000 2.22045e-16 2.33688e-16 2.22045e-16 0.485358 0.532412
1.000000000000000 2.66454e-15 2.80425e-15 2.66454e-15 0.485358 0.532412
0.100000000000000 2.39808e-14 2.52383e-14 2.39808e-14 0.485358 0.532412
0.010000000000000 3.79252e-13 3.99139e-13 3.79252e-13 0.485358 0.532412
0.050000000000000 9.50351e-14 1.00018e-13 9.50351e-14 0.485358 0.532412
0.001000000000000 3.31291e-13 3.48662e-13 3.31291e-13 0.485358 0.532412
0.000100000000000 1.38796e-11 1.46074e-11 1.38796e-11 0.485358 0.532412
0.000010000000000 1.28229e-10 1.34953e-10 1.28229e-10 0.485358 0.532412
---------------------------------------------------------------
x_shape (1, 84) y_shape (84, 1)
<type 'numpy.float32'>
epsilon max diff max relative diff avg_abs_diff avg_abs_x avg_abs_y
10000.000000000000000 0 0 0 0.475109 0.482829
1000.000000000000000 2.98023e-08 1.10408e-07 2.98023e-08 0.475109 0.482829
100.000000000000000 2.98023e-08 1.10408e-07 2.98023e-08 0.475109 0.482829
10.000000000000000 0 0 0 0.475109 0.482829
1.000000000000000 8.9407e-08 3.31225e-07 8.9407e-08 0.475109 0.482829
0.100000000000000 8.9407e-08 3.31225e-07 8.9407e-08 0.475109 0.482829
0.010000000000000 3.80576e-05 0.000140992 3.80576e-05 0.475109 0.482829
0.050000000000000 8.9407e-08 3.31225e-07 8.9407e-08 0.475109 0.482829
0.001000000000000 3.80576e-05 0.000140992 3.80576e-05 0.475109 0.482829
0.000100000000000 0.00289908 0.0107402 0.00289908 0.475109 0.482829
0.000010000000000 0.079193 0.293386 0.079193 0.475109 0.482829
x_shape (1, 84) y_shape (84, 1)
<type 'numpy.float64'>
epsilon max diff max relative diff avg_abs_diff avg_abs_x avg_abs_y
10000.000000000000000 0 0 0 0.475109 0.482829
1000.000000000000000 5.55112e-17 2.05652e-16 5.55112e-17 0.475109 0.482829
100.000000000000000 5.55112e-17 2.05652e-16 5.55112e-17 0.475109 0.482829
10.000000000000000 5.55112e-17 2.05652e-16 5.55112e-17 0.475109 0.482829
1.000000000000000 6.66134e-16 2.46782e-15 6.66134e-16 0.475109 0.482829
0.100000000000000 7.77156e-15 2.87912e-14 7.77156e-15 0.475109 0.482829
0.010000000000000 6.10623e-14 2.26217e-13 6.10623e-14 0.475109 0.482829
0.050000000000000 9.99201e-15 3.70173e-14 9.99201e-15 0.475109 0.482829
0.001000000000000 4.16334e-13 1.54239e-12 4.16334e-13 0.475109 0.482829
0.000100000000000 6.68909e-12 2.4781e-11 6.68909e-12 0.475109 0.482829
0.000010000000000 1.84325e-10 6.82867e-10 1.84325e-10 0.475109 0.482829
---------------------------------------------------------------
x_shape (1, 10) y_shape (10, 1)
<type 'numpy.float32'>
epsilon max diff max relative diff avg_abs_diff avg_abs_x avg_abs_y
10000.000000000000000 0 0 0 0.314629 0.419932
1000.000000000000000 0 0 0 0.314629 0.419932
100.000000000000000 2.98023e-08 7.10943e-08 2.98023e-08 0.314629 0.419932
10.000000000000000 0 0 0 0.314629 0.419932
1.000000000000000 0 0 0 0.314629 0.419932
0.100000000000000 1.78814e-07 4.26566e-07 1.78814e-07 0.314629 0.419932
0.010000000000000 1.01328e-06 2.4172e-06 1.01328e-06 0.314629 0.419932
0.050000000000000 1.78814e-07 4.26566e-07 1.78814e-07 0.314629 0.419932
0.001000000000000 4.91738e-06 1.17306e-05 4.91738e-06 0.314629 0.419932
0.000100000000000 0.000173867 0.000414764 0.000173867 0.314629 0.419932
0.000010000000000 0.00399846 0.00953843 0.00399846 0.314629 0.419932
x_shape (1, 10) y_shape (10, 1)
<type 'numpy.float64'>
epsilon max diff max relative diff avg_abs_diff avg_abs_x avg_abs_y
10000.000000000000000 5.55112e-17 1.32423e-16 5.55112e-17 0.314629 0.419932
1000.000000000000000 0 0 0 0.314629 0.419932
100.000000000000000 0 0 0 0.314629 0.419932
10.000000000000000 0 0 0 0.314629 0.419932
1.000000000000000 1.11022e-16 2.64847e-16 1.11022e-16 0.314629 0.419932
0.100000000000000 9.99201e-16 2.38362e-15 9.99201e-16 0.314629 0.419932
0.010000000000000 7.66054e-15 1.82744e-14 7.66054e-15 0.314629 0.419932
0.050000000000000 1.22125e-15 2.91331e-15 1.22125e-15 0.314629 0.419932
0.001000000000000 9.64784e-14 2.30152e-13 9.64784e-14 0.314629 0.419932
0.000100000000000 5.40568e-13 1.28954e-12 5.40568e-13 0.314629 0.419932
0.000010000000000 8.34116e-12 1.98981e-11 8.34116e-12 0.314629 0.419932
---------------------------------------------------------------
x_shape (1, 1) y_shape (1, 1)
<type 'numpy.float32'>
epsilon max diff max relative diff avg_abs_diff avg_abs_x avg_abs_y
10000.000000000000000 0 0 0 0.417022 0.720325
1000.000000000000000 0 0 0 0.417022 0.720325
100.000000000000000 5.96046e-08 8.27469e-08 5.96046e-08 0.417022 0.720325
10.000000000000000 0 0 0 0.417022 0.720325
1.000000000000000 0 0 0 0.417022 0.720325
0.100000000000000 0 0 0 0.417022 0.720325
0.010000000000000 8.9407e-07 1.2412e-06 8.9407e-07 0.417022 0.720325
0.050000000000000 0 0 0 0.417022 0.720325
0.001000000000000 2.44379e-06 3.39262e-06 2.44379e-06 0.417022 0.720325
0.000100000000000 2.38419e-06 3.30988e-06 2.38419e-06 0.417022 0.720325
0.000010000000000 0.000891685 0.00123789 0.000891685 0.417022 0.720325
x_shape (1, 1) y_shape (1, 1)
<type 'numpy.float64'>
epsilon max diff max relative diff avg_abs_diff avg_abs_x avg_abs_y
10000.000000000000000 0 0 0 0.417022 0.720324
1000.000000000000000 0 0 0 0.417022 0.720324
100.000000000000000 1.11022e-16 1.54128e-16 1.11022e-16 0.417022 0.720324
10.000000000000000 1.11022e-16 1.54128e-16 1.11022e-16 0.417022 0.720324
1.000000000000000 0 0 0 0.417022 0.720324
0.100000000000000 1.11022e-16 1.54128e-16 1.11022e-16 0.417022 0.720324
0.010000000000000 1.22125e-15 1.69541e-15 1.22125e-15 0.417022 0.720324
0.050000000000000 1.11022e-16 1.54128e-16 1.11022e-16 0.417022 0.720324
0.001000000000000 1.23235e-14 1.71082e-14 1.23235e-14 0.417022 0.720324
0.000100000000000 1.23346e-13 1.71236e-13 1.23346e-13 0.417022 0.720324
0.000010000000000 4.31877e-13 5.99559e-13 4.31877e-13 0.417022 0.720324
---------------------------------------------------------------
code
import numpy as np
def print_diff(dtype, x_shape, y_shape):
np.random.seed(1)
x = np.random.random(x_shape).astype(dtype)
y = np.random.random(y_shape).astype(dtype)
def f(e):
return np.matmul(x + e, y)
e = np.zeros(x_shape).astype(dtype)
one = e.copy()
one[0, 0] = 1
target = np.dot(one, y)
print '%-21s\t%-21s\t%-21s\t%-21s\t%-21s\t%-21s' \
% ('delta', 'max diff', 'max relative diff',
'avg_abs_diff', 'avg_abs_x', 'avg_abs_y')
#for delta in [10000, 1000, 100, 10, 1, 0.1, 0.01, 0.05, 0.001, 0.0001, 0.00001]:
for delta in [0.01, 0.05, 0.001, 0.0001, 0.00001]:
#delta = np.abs(x).sum() / x.size
e[0, 0] = delta
grad = (f(e) - f(-e)) / 2 / delta
#grad = np.matmul(e, y) / delta
diff = grad - target
target_ = target.copy()
target_[target_ < 1e-3] = 1
relative_diff = np.abs(diff) / target_
print '%21.15f\t%-21g\t%-21g\t%-21g\t%-21g\t%-21g' \
% (delta,
np.abs(diff).max(),
np.abs(relative_diff).max(),
np.abs(diff).mean(),
np.abs(x).mean(),
np.abs(y).mean())
for x_shape, y_shape in [((1, 200), (200, 1)), ((1, 84), (84, 1)), ((1, 10), (10, 1))]:
for dtype in (np.float32, np.float64):
print 'x_shape', x_shape, 'y_shape', y_shape
print dtype
print_diff(dtype, x_shape, y_shape)
print ''
print '-' * 63
Metadata
Metadata
Assignees
Labels
No labels