-
Notifications
You must be signed in to change notification settings - Fork 1
Error and Convergence
There are two main types of error. The absolute error is defined as the distance from the root and is bounded by | x - y |. This gives one way to terminate and is often the type of error a user wants. To guarantee this error is halved, the arithmetic mean (AM) is used.
Although absolute error is simple to understand, it does not actually match the structure of floating point numbers and misses the concept of significant figures. The relative error is defined as the absolute error divided by the absolute value of the root and is estimated by | x1 - x2 | / (0.5 | x1 + x2 |). There are several interpretations of the relative error. One interpretation is it measures the number of significant figures to which the root is found. Another intuitive interpretation is it measures how close the ratio x1 / x2 is to 1. To guarantee this error is halved (sort of), the geometric mean (GM) is used.
Let dn be the number of digits accurate on the nth iteration. The order of convergence may be defined as
order = limn → ∞ dn1/n,
which is often interpreted as an approximation in the form of
dn + 1 ≈ order × dn .
For example, if the order of convergence for a method is 1.500, then it can be expected that that method gains 50% more digits accurate on every iteration eventually.
In terms of the absolute error, the order is often interpreted as an approximation in the form of
| εn + 1 | ≈ | εn |order
All methods have an order of convergence of at least 1.000, meaning they will converge.
Assuming the root is simple, the orders of convergence for each of the methods in pyroot are as follows:
| Method | Order | Root of |
|---|---|---|
| Bisection | 1.000 | x - 1 |
| False Position | 1.414 | x2 - 2 |
| Newton* | 2.291 | x15 - 63x10 - 3x5 + 1 |
| Muller | 1.731 | x12 - 9 x 8 + 1 |
| Dekker | 1.618 | x2 - x - 1 |
| Brent | 1.731 | x12 - 9x8 + 1 |
| Chandrupatla | 1.731 | x12 - 9x8 + 1 |
| Chandrupatla-Quadratic | 1.731 | x12 - 9x8 + 1 |
| Chandrupatla-Mixed | 1.731 | x12 - 9x8 + 1 |
*Note: Newton's method requires the derivative fprime. If fprime is counted as another function evaluation, and function evaluations are used for the number of iterations, Newton's method drops down to an order of ≈1.513.
In terms of their original variants (as bracketing methods), without the pyroot.solver implementation, the orders of convergence are as follows:
| Method | Order | Root of |
|---|---|---|
| Bisection | 1.000 | x - 1 |
| False Position | 1.000 | x - 1 |
| Newton* | 2.000 | x - 2 |
| Muller | 1.618 | x 2 - x - 1 |
| Dekker | 1.618 | x 2 - x - 1 |
| Brent | 1.618 | x 2 - x - 1 |
| Chandrupatla | 1.618 | x 2 - x - 1 |
| Chandrupatla-Quadratic | N/A | Experimental |
| Chandrupatla-Mixed | N/A | Experimental |
The significant decrease in performance is due to the possibility that one of the bracketing points does not remain stuck. When this occurs, the interpolation methods degenerate into a secant approximation using 2 converging points on one side of the root.
The rate of convergence is a refinement of the order of convergence. While the order of convergence describes the first term in the error, the rate of convergence describes the second term, in the refined approximation formulas:
dn + 1 ≈ order × dn - log(rate),
εn + 1 ≈ rate × εnorder,
where the base of the log depends on what numbering system is used for dn (e.g. 2 = binary, 10 = base 10) i.e. dn = -logbase( εn ).
The rate of convergence is particularly relevant when the order is 1, because that is when the rate of convergence controls how fast it converges. Outside of this however, the rate of convergence is usually very complicated and hardly worth a mention, and thus are omitted from here.
A brief analysis of the orders of convergences shown in the tables above are provided here.
Let us denote the following variables:
- x☆ denotes the root.
- x̂☆ denotes the next estimate of the root.
- x1 denotes the opposing estimate of the root.
- x2 denotes the last estimate of the root.
- x3 denotes the last removed from x[1, 2].
- εi denotes x☆ - xi , the signed errors of xi .
- ε̂i denotes x̂☆ - xi , the approximate signed errors of xi .
- di denotes -log( εi ), the digits of accuracy for xi .
- d̂i denotes -log( ε̂i ), the digits of accuracy for x̂i .
On every iteration, bisection halves the error. Thus, it is easy to see that the order is 1 and the rate is 0.5.
Newton's interpolation formula provides an easy way to understand the error behavior when dealing with interpolation methods. It gives us the approximation
f ( x☆ ) = f[ i ] + f[ i , j ] εi + f[ i , j , k ] εi εj + O (εi εj εk)
where f[ i ], f[ i , j ], and f[ i , j , k ] are values depending on their respective xi . This inverts itself to provide the following estimates for the root (depending on the points used):
Using x[1, 2]:
- f ( x☆ ) = f[2] + f[1, 2] ε2 + O ( ε1 ε2 ).
- x̂☆ = x☆ + O ( ε1 ε2 ).
Using x[2, 3]:
- f ( x☆ ) = f[2] + f[2, 3] ε2 + O ( ε2 ε3 ).
- x̂☆ = x☆ + O ( ε2 ε3 ).
Using x[1, 2, 3]:
- f ( x☆ ) = f[2] + f[2, 3] ε2 + f[1, 2, 3] ε2 ε3 + O ( ε1 ε2 ε3 ).
- x̂☆ = x☆ + O ( ε1 ε2 ε3 ).
Now we may construct the orders of convergence for methods using polynomial interpolation (a slight modification but equivalent result follows from inverse polynomial interpolation as well).
Furthermore, by extending these methods out with 1 extra point, we may also make note of what the sign of the error is, which is crucial for knowing whether x̂☆ will replace x1 or x2. In fact, we can argue that in the worst case it will always replace x2, assuming f[...] doesn't change signs, which is asymptotically the case except when higher order derivatives are vanishing at the root.
The false position method uses only x[1, 2]. Expanding one step further, we may see that we have
- f ( x☆ ) = f[2] + f[1, 2] ε2 + f[1, 2, 3] ε1 ε2 + O ( ε1 ε2 ε3 ).
- x̂☆ = x☆ + C ε1 ε2 + O ( ε1 ε2 ε3 ).
Assuming C does not asymptotically change signs (unless f ''( x ) happens to change signs at x☆ ), we may see that ε1 ε2 is always negative. This means x̂☆ will always have the same sign for the error, and f ( x̂☆ ) will always have fixed sign. This should intuitively match a sketch of the false position method.
Since the sign of the error asymptotically never changes, this means x̂☆ must be replacing x2, leaving x1 as a constant. Since the error term is C ε1 ε2, this implies that we have ε2 ≈ C ε1 ε3 from the previous iteration, meaning the error decreases by a factor of C ε1 every iteration, giving an order of magnitude of 1.
We now formalize this in an easier format for analysis. By throwing away constants and considering di , we can tackle d1, d2, and d3 in terms of their previous iterations.
- d1 remains the same.
- d2 is the sum of d1 and d2 (i.e. ε2 := ε1 ε2 ).
- d3 is the previous d2.
Expressed as a matrix, this is the same as multiplying by
[[1, 0, 0],
[1, 1, 0],
[0, 1, 0]]whose largest eigenvalue is 1, the order of this method.
The pyroot implementation is the same as the original, except after 3 iterations of failing to update x1, it forces an update by modifying the value of f[1, 2]. This asymptotically increases error for the next iteration from O ( ε1 ε2 ) to O ( ε2 ), but forces the sign of the error to change, and hence updates x1. This leads to, on every 4th iteration, the following matrix
[[0, 1, 0], # The new x1 is the previous x2.
[0, 1, 0], # The new x2 has O(e2) error.
[1, 0, 0]] # The last value removed is the previous x1.which after multiplying 1 iteration of this with 3 iterations of the other gives the matrix gives
# SWAP @ KEEP @ KEEP @ KEEP =
[[3, 1, 0]
[3, 1, 0]
[1, 0, 0]]which has a largest eigenvalue of 4. Thus, the order of convergence is 41/4.
Dekker's method is a particularly interesting one. Rather than using only x[1, 2], it uses both x[1, 2] and x[2, 3]. Specifically, it uses x2 with the previous x2, which may become either x1 or x3. As such, iterations are really just x̂ and x2, with matrix given by
[[1, 1],
[1, 0]]which has largest eigenvalue (1 + 51/2) / 2 ≈ 1.618, the order of this method.
The pyroot implementation is the same as the original, except it forces sign changes if they do not occur.