-
Notifications
You must be signed in to change notification settings - Fork 487
Description
The widely-used DBDSQR routine has a bug that results in failed convergence without any work done.
The MAXIT variable ("the maximum number of passes through the inner loop permitted before nonconvergence signalled") is of the INTEGER type (32bit range: -2147483648 to 2147483647).
The bug is hidden in the following assignment:
MAXIT = MAXITR*N*N,
where MAXITR=6.
Now it is clear that for N >= CEIL( SQRT(2147483647 / 6) ) = 18919 the MAXIT < 0.
Such N is not unusually large since the matrix is bidiagonal.
Then, at the beginning of the main iteration loop (60 CONTINUE), the iteration counter ITER is tested against MAXIT in block "Check for convergence or exceeding iteration count":
IF( ITER.GT.MAXIT )
$ GO TO 200
leading to immediate termination for ITER=0.
Proposed solution
The possible fix seems to be straightforward: avoid direct computation of MAXITR*N*N value, using the factorisation (MAXITR*N)*(N) instead.
Luckily, both MAXIT and ITER are used only in 3 places:
- initial value assignments,
- the comparison
IF( ITER.GT.MAXIT ), - ITER update:
ITER = ITER + M - LL, where0 <= LL <= M <= N,
suggesting the following solution:
- variables and their initial values:
INTEGER MAXITDIVN, ITERDIVN, ITER
MAXITDIVN = MAXITR*N
ITERDIVN = 0
ITER = -1
- the test:
IF( ITER.GE.N ) THEN
ITER = ITER - N
ITERDIVN = ITERDIVN + 1
IF( ITERDIVN.GE.MAXITDIVN )
$ GO TO 200
END IF
- ITER update (
ITER = ITER + M - LL) remains unchanged
The change of initial assignment ITER = -1 is required to obtain the same number of iterations, since the original test uses ITER.GT.MAXIT, and the proposed change uses GE.
I’d really like to prepare a merge request, but I do lack any Fortran programming experience, not to mention the Fortran proficiency.
E.g. the proposed solution very slightly increases the computational cost of the routine execution by 6*N*(1 comparison + 2*additions), but with reordering the factorisation (initial test condition becomes IF( ITER.GE."6*N" ) THEN … ) it can be reduced to N additional comparisons and 2N additions, but I don’t know how to Fortran it - please help…
This bug reappears from time to time (e.g. http://icl.cs.utk.edu/lapack-forum/archives/lapack/msg01294.html , numpy/numpy#1588 ), and e.g. currently OpenBLAS uses "-fdefault-integer-8" only when BINARY64 & INTERFACE64 ( https://github.com/xianyi/OpenBLAS/blob/develop/cmake/fc.cmake ), so apparently "-fdefault-integer-8" is too easy to forget to be a solution.
Best regards,
Cezary Dendek