bug in dbdsqr.f: integer overflow with 32bit integers results in failed convergence

The widely-used DBDSQR routine has a bug that results in failed convergence without any work done.

The MAXIT variable ("the maximum number of passes through the inner loop permitted before nonconvergence signalled") is of the INTEGER type (32bit range: -2147483648 to 2147483647). 
The bug is hidden in the following assignment:

`MAXIT = MAXITR*N*N, 
`
where `MAXITR=6`.

Now it is clear that for `N >=  CEIL( SQRT(2147483647 / 6) ) = 18919` the `MAXIT < 0`.
Such N is not unusually large since the matrix is bidiagonal.

Then, at the beginning of the main iteration loop (`60 CONTINUE`), the iteration counter ITER is tested against MAXIT in block "Check for convergence or exceeding iteration count":

```
IF( ITER.GT.MAXIT )
     $   GO TO 200
```

leading to immediate termination for ITER=0.


## Proposed solution

The possible fix seems to be straightforward: avoid direct computation of `MAXITR*N*N` value, using the factorisation `(MAXITR*N)*(N)` instead.

Luckily, both MAXIT and ITER are used only in 3 places:
1. initial value assignments,
2. the comparison `IF( ITER.GT.MAXIT )`,
3. ITER update: `ITER = ITER + M - LL`, where `0 <= LL <= M <= N`,

suggesting the following solution:

1. variables and their initial values:
```
INTEGER   MAXITDIVN, ITERDIVN, ITER

MAXITDIVN = MAXITR*N
ITERDIVN = 0
ITER = -1
```
2. the test:
```
      IF( ITER.GE.N ) THEN
          ITER = ITER - N
          ITERDIVN = ITERDIVN + 1
          IF( ITERDIVN.GE.MAXITDIVN )
          $   GO TO 200
      END IF
```
3. ITER update ( `ITER = ITER + M - LL` ) remains unchanged

The change of initial assignment `ITER = -1` is required to obtain the same number of iterations, since the original test uses ITER.GT.MAXIT, and the proposed change uses GE.

I’d really like to prepare a merge request, but I do lack any Fortran programming experience, not to mention the Fortran proficiency.
E.g. the proposed solution very slightly increases the computational cost of the routine execution by `6*N*(1 comparison + 2*additions)`, but with reordering the factorisation (initial test condition becomes `IF( ITER.GE."6*N" ) THEN … )` it can be reduced to N additional comparisons and 2N additions, but I don’t know how to Fortran it - please help…

This bug reappears from time to time (e.g. http://icl.cs.utk.edu/lapack-forum/archives/lapack/msg01294.html , https://github.com/numpy/numpy/issues/1588 ), and e.g. currently OpenBLAS uses `"-fdefault-integer-8"` only when `BINARY64 & INTERFACE64` ( https://github.com/xianyi/OpenBLAS/blob/develop/cmake/fc.cmake ), so apparently `"-fdefault-integer-8"` is too easy to forget to be a solution.

Best regards,

Cezary Dendek


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug in dbdsqr.f: integer overflow with 32bit integers results in failed convergence #135

Proposed solution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

bug in dbdsqr.f: integer overflow with 32bit integers results in failed convergence #135

Description

Proposed solution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions