using float32 in gradient checking may be not appropriate due to large rounding error

**Major problem**: float32's rounding error is so large that it may dominate the difference between the numerical gradients and the analytical gradients, which cases relatively large relative error in gradient checking. As a consensus, the gradient checker used in unit tests may be unreliable.

**Potential solution**:
1. Choosing epsilon carefully to make rounding error reasonable. However, this is a challenging task. See https://en.wikipedia.org/wiki/Numerical_differentiation and the experiments in the end of this issue.
2. Using float64 instead of float32. Reference: http://cs231n.github.io/neural-networks-3/

**Experiments**
The differences between the numerical and analytical gradients of the linear function f(x, y) = x^T * y are shown as bellow. We can conclude that
1. Although linear function is very simple, the absolute error and relative error are unacceptable large if float32 is used.
2. The errors are very small is float64 is used.
3. If the scale of epsilon is comparable with x/y, errors will be small. But I'm not sure whether this conclusion generalizes to more complicated functions.

```
x_shape (1, 200) y_shape (200, 1)
<type 'numpy.float32'>
epsilon              	max diff             	max relative diff    	avg_abs_diff         	avg_abs_x            	avg_abs_y            
10000.000000000000000	5.96046e-08          	6.27301e-08          	5.96046e-08          	0.485359             	0.532412             
 1000.000000000000000	0                    	0                    	0                    	0.485359             	0.532412             
  100.000000000000000	5.96046e-08          	6.27301e-08          	5.96046e-08          	0.485359             	0.532412             
   10.000000000000000	2.98023e-07          	3.13651e-07          	2.98023e-07          	0.485359             	0.532412             
    1.000000000000000	1.19209e-07          	1.2546e-07           	1.19209e-07          	0.485359             	0.532412             
    0.100000000000000	7.7486e-06           	8.15491e-06          	7.7486e-06           	0.485359             	0.532412             
    0.010000000000000	6.49691e-05          	6.83758e-05          	6.49691e-05          	0.485359             	0.532412             
    0.050000000000000	2.68221e-05          	2.82285e-05          	2.68221e-05          	0.485359             	0.532412             
    0.001000000000000	0.00031656           	0.00033316           	0.00031656           	0.485359             	0.532412             
    0.000100000000000	0.0034982            	0.00368163           	0.0034982            	0.485359             	0.532412             
    0.000010000000000	0.194233             	0.204418             	0.194233             	0.485359             	0.532412             

x_shape (1, 200) y_shape (200, 1)
<type 'numpy.float64'>
epsilon              	max diff             	max relative diff    	avg_abs_diff         	avg_abs_x            	avg_abs_y            
10000.000000000000000	0                    	0                    	0                    	0.485358             	0.532412             
 1000.000000000000000	0                    	0                    	0                    	0.485358             	0.532412             
  100.000000000000000	0                    	0                    	0                    	0.485358             	0.532412             
   10.000000000000000	2.22045e-16          	2.33688e-16          	2.22045e-16          	0.485358             	0.532412             
    1.000000000000000	2.66454e-15          	2.80425e-15          	2.66454e-15          	0.485358             	0.532412             
    0.100000000000000	2.39808e-14          	2.52383e-14          	2.39808e-14          	0.485358             	0.532412             
    0.010000000000000	3.79252e-13          	3.99139e-13          	3.79252e-13          	0.485358             	0.532412             
    0.050000000000000	9.50351e-14          	1.00018e-13          	9.50351e-14          	0.485358             	0.532412             
    0.001000000000000	3.31291e-13          	3.48662e-13          	3.31291e-13          	0.485358             	0.532412             
    0.000100000000000	1.38796e-11          	1.46074e-11          	1.38796e-11          	0.485358             	0.532412             
    0.000010000000000	1.28229e-10          	1.34953e-10          	1.28229e-10          	0.485358             	0.532412             

---------------------------------------------------------------
x_shape (1, 84) y_shape (84, 1)
<type 'numpy.float32'>
epsilon              	max diff             	max relative diff    	avg_abs_diff         	avg_abs_x            	avg_abs_y            
10000.000000000000000	0                    	0                    	0                    	0.475109             	0.482829             
 1000.000000000000000	2.98023e-08          	1.10408e-07          	2.98023e-08          	0.475109             	0.482829             
  100.000000000000000	2.98023e-08          	1.10408e-07          	2.98023e-08          	0.475109             	0.482829             
   10.000000000000000	0                    	0                    	0                    	0.475109             	0.482829             
    1.000000000000000	8.9407e-08           	3.31225e-07          	8.9407e-08           	0.475109             	0.482829             
    0.100000000000000	8.9407e-08           	3.31225e-07          	8.9407e-08           	0.475109             	0.482829             
    0.010000000000000	3.80576e-05          	0.000140992          	3.80576e-05          	0.475109             	0.482829             
    0.050000000000000	8.9407e-08           	3.31225e-07          	8.9407e-08           	0.475109             	0.482829             
    0.001000000000000	3.80576e-05          	0.000140992          	3.80576e-05          	0.475109             	0.482829             
    0.000100000000000	0.00289908           	0.0107402            	0.00289908           	0.475109             	0.482829             
    0.000010000000000	0.079193             	0.293386             	0.079193             	0.475109             	0.482829             

x_shape (1, 84) y_shape (84, 1)
<type 'numpy.float64'>
epsilon              	max diff             	max relative diff    	avg_abs_diff         	avg_abs_x            	avg_abs_y            
10000.000000000000000	0                    	0                    	0                    	0.475109             	0.482829             
 1000.000000000000000	5.55112e-17          	2.05652e-16          	5.55112e-17          	0.475109             	0.482829             
  100.000000000000000	5.55112e-17          	2.05652e-16          	5.55112e-17          	0.475109             	0.482829             
   10.000000000000000	5.55112e-17          	2.05652e-16          	5.55112e-17          	0.475109             	0.482829             
    1.000000000000000	6.66134e-16          	2.46782e-15          	6.66134e-16          	0.475109             	0.482829             
    0.100000000000000	7.77156e-15          	2.87912e-14          	7.77156e-15          	0.475109             	0.482829             
    0.010000000000000	6.10623e-14          	2.26217e-13          	6.10623e-14          	0.475109             	0.482829             
    0.050000000000000	9.99201e-15          	3.70173e-14          	9.99201e-15          	0.475109             	0.482829             
    0.001000000000000	4.16334e-13          	1.54239e-12          	4.16334e-13          	0.475109             	0.482829             
    0.000100000000000	6.68909e-12          	2.4781e-11           	6.68909e-12          	0.475109             	0.482829             
    0.000010000000000	1.84325e-10          	6.82867e-10          	1.84325e-10          	0.475109             	0.482829             

---------------------------------------------------------------
x_shape (1, 10) y_shape (10, 1)
<type 'numpy.float32'>
epsilon              	max diff             	max relative diff    	avg_abs_diff         	avg_abs_x            	avg_abs_y            
10000.000000000000000	0                    	0                    	0                    	0.314629             	0.419932             
 1000.000000000000000	0                    	0                    	0                    	0.314629             	0.419932             
  100.000000000000000	2.98023e-08          	7.10943e-08          	2.98023e-08          	0.314629             	0.419932             
   10.000000000000000	0                    	0                    	0                    	0.314629             	0.419932             
    1.000000000000000	0                    	0                    	0                    	0.314629             	0.419932             
    0.100000000000000	1.78814e-07          	4.26566e-07          	1.78814e-07          	0.314629             	0.419932             
    0.010000000000000	1.01328e-06          	2.4172e-06           	1.01328e-06          	0.314629             	0.419932             
    0.050000000000000	1.78814e-07          	4.26566e-07          	1.78814e-07          	0.314629             	0.419932             
    0.001000000000000	4.91738e-06          	1.17306e-05          	4.91738e-06          	0.314629             	0.419932             
    0.000100000000000	0.000173867          	0.000414764          	0.000173867          	0.314629             	0.419932             
    0.000010000000000	0.00399846           	0.00953843           	0.00399846           	0.314629             	0.419932             

x_shape (1, 10) y_shape (10, 1)
<type 'numpy.float64'>
epsilon              	max diff             	max relative diff    	avg_abs_diff         	avg_abs_x            	avg_abs_y            
10000.000000000000000	5.55112e-17          	1.32423e-16          	5.55112e-17          	0.314629             	0.419932             
 1000.000000000000000	0                    	0                    	0                    	0.314629             	0.419932             
  100.000000000000000	0                    	0                    	0                    	0.314629             	0.419932             
   10.000000000000000	0                    	0                    	0                    	0.314629             	0.419932             
    1.000000000000000	1.11022e-16          	2.64847e-16          	1.11022e-16          	0.314629             	0.419932             
    0.100000000000000	9.99201e-16          	2.38362e-15          	9.99201e-16          	0.314629             	0.419932             
    0.010000000000000	7.66054e-15          	1.82744e-14          	7.66054e-15          	0.314629             	0.419932             
    0.050000000000000	1.22125e-15          	2.91331e-15          	1.22125e-15          	0.314629             	0.419932             
    0.001000000000000	9.64784e-14          	2.30152e-13          	9.64784e-14          	0.314629             	0.419932             
    0.000100000000000	5.40568e-13          	1.28954e-12          	5.40568e-13          	0.314629             	0.419932             
    0.000010000000000	8.34116e-12          	1.98981e-11          	8.34116e-12          	0.314629             	0.419932             

---------------------------------------------------------------
x_shape (1, 1) y_shape (1, 1)
<type 'numpy.float32'>
epsilon              	max diff             	max relative diff    	avg_abs_diff         	avg_abs_x            	avg_abs_y            
10000.000000000000000	0                    	0                    	0                    	0.417022             	0.720325             
 1000.000000000000000	0                    	0                    	0                    	0.417022             	0.720325             
  100.000000000000000	5.96046e-08          	8.27469e-08          	5.96046e-08          	0.417022             	0.720325             
   10.000000000000000	0                    	0                    	0                    	0.417022             	0.720325             
    1.000000000000000	0                    	0                    	0                    	0.417022             	0.720325             
    0.100000000000000	0                    	0                    	0                    	0.417022             	0.720325             
    0.010000000000000	8.9407e-07           	1.2412e-06           	8.9407e-07           	0.417022             	0.720325             
    0.050000000000000	0                    	0                    	0                    	0.417022             	0.720325             
    0.001000000000000	2.44379e-06          	3.39262e-06          	2.44379e-06          	0.417022             	0.720325             
    0.000100000000000	2.38419e-06          	3.30988e-06          	2.38419e-06          	0.417022             	0.720325             
    0.000010000000000	0.000891685          	0.00123789           	0.000891685          	0.417022             	0.720325             

x_shape (1, 1) y_shape (1, 1)
<type 'numpy.float64'>
epsilon              	max diff             	max relative diff    	avg_abs_diff         	avg_abs_x            	avg_abs_y            
10000.000000000000000	0                    	0                    	0                    	0.417022             	0.720324             
 1000.000000000000000	0                    	0                    	0                    	0.417022             	0.720324             
  100.000000000000000	1.11022e-16          	1.54128e-16          	1.11022e-16          	0.417022             	0.720324             
   10.000000000000000	1.11022e-16          	1.54128e-16          	1.11022e-16          	0.417022             	0.720324             
    1.000000000000000	0                    	0                    	0                    	0.417022             	0.720324             
    0.100000000000000	1.11022e-16          	1.54128e-16          	1.11022e-16          	0.417022             	0.720324             
    0.010000000000000	1.22125e-15          	1.69541e-15          	1.22125e-15          	0.417022             	0.720324             
    0.050000000000000	1.11022e-16          	1.54128e-16          	1.11022e-16          	0.417022             	0.720324             
    0.001000000000000	1.23235e-14          	1.71082e-14          	1.23235e-14          	0.417022             	0.720324             
    0.000100000000000	1.23346e-13          	1.71236e-13          	1.23346e-13          	0.417022             	0.720324             
    0.000010000000000	4.31877e-13          	5.99559e-13          	4.31877e-13          	0.417022             	0.720324             

---------------------------------------------------------------
```
code
```python
import numpy as np

def print_diff(dtype, x_shape, y_shape):
    np.random.seed(1)

    x = np.random.random(x_shape).astype(dtype)
    y = np.random.random(y_shape).astype(dtype)
    
    def f(e):
        return np.matmul(x + e, y)
    
    e = np.zeros(x_shape).astype(dtype)
    
    one = e.copy()
    one[0, 0] = 1
    target = np.dot(one, y)
    
    print '%-21s\t%-21s\t%-21s\t%-21s\t%-21s\t%-21s' \
            % ('delta', 'max diff', 'max relative diff',
               'avg_abs_diff', 'avg_abs_x', 'avg_abs_y')
    #for delta in [10000, 1000, 100, 10, 1, 0.1, 0.01, 0.05, 0.001, 0.0001, 0.00001]:
    for delta in [0.01, 0.05, 0.001, 0.0001, 0.00001]:
        #delta = np.abs(x).sum() / x.size
        e[0, 0] = delta
        grad = (f(e) - f(-e)) / 2 / delta
        #grad = np.matmul(e, y) / delta
        
        diff = grad - target
        
        target_ = target.copy()
        target_[target_ < 1e-3] = 1
        relative_diff = np.abs(diff) / target_
    
        print '%21.15f\t%-21g\t%-21g\t%-21g\t%-21g\t%-21g' \
                % (delta,
                   np.abs(diff).max(),
                   np.abs(relative_diff).max(),
                   np.abs(diff).mean(),
                   np.abs(x).mean(),
                   np.abs(y).mean())

for x_shape, y_shape in [((1, 200), (200, 1)), ((1, 84), (84, 1)), ((1, 10), (10, 1))]:
    for dtype in (np.float32, np.float64):
        print 'x_shape', x_shape, 'y_shape', y_shape
        print dtype
        print_diff(dtype, x_shape, y_shape)
        print ''

    print '-' * 63
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

using float32 in gradient checking may be not appropriate due to large rounding error #4283

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

using float32 in gradient checking may be not appropriate due to large rounding error #4283

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions