Skip to content

bug in dbdsqr.f: integer overflow with 32bit integers results in failed convergence #135

@cezden

Description

@cezden

The widely-used DBDSQR routine has a bug that results in failed convergence without any work done.

The MAXIT variable ("the maximum number of passes through the inner loop permitted before nonconvergence signalled") is of the INTEGER type (32bit range: -2147483648 to 2147483647).
The bug is hidden in the following assignment:

MAXIT = MAXITR*N*N,
where MAXITR=6.

Now it is clear that for N >= CEIL( SQRT(2147483647 / 6) ) = 18919 the MAXIT < 0.
Such N is not unusually large since the matrix is bidiagonal.

Then, at the beginning of the main iteration loop (60 CONTINUE), the iteration counter ITER is tested against MAXIT in block "Check for convergence or exceeding iteration count":

IF( ITER.GT.MAXIT )
     $   GO TO 200

leading to immediate termination for ITER=0.

Proposed solution

The possible fix seems to be straightforward: avoid direct computation of MAXITR*N*N value, using the factorisation (MAXITR*N)*(N) instead.

Luckily, both MAXIT and ITER are used only in 3 places:

  1. initial value assignments,
  2. the comparison IF( ITER.GT.MAXIT ),
  3. ITER update: ITER = ITER + M - LL, where 0 <= LL <= M <= N,

suggesting the following solution:

  1. variables and their initial values:
INTEGER   MAXITDIVN, ITERDIVN, ITER

MAXITDIVN = MAXITR*N
ITERDIVN = 0
ITER = -1
  1. the test:
      IF( ITER.GE.N ) THEN
          ITER = ITER - N
          ITERDIVN = ITERDIVN + 1
          IF( ITERDIVN.GE.MAXITDIVN )
          $   GO TO 200
      END IF
  1. ITER update ( ITER = ITER + M - LL ) remains unchanged

The change of initial assignment ITER = -1 is required to obtain the same number of iterations, since the original test uses ITER.GT.MAXIT, and the proposed change uses GE.

I’d really like to prepare a merge request, but I do lack any Fortran programming experience, not to mention the Fortran proficiency.
E.g. the proposed solution very slightly increases the computational cost of the routine execution by 6*N*(1 comparison + 2*additions), but with reordering the factorisation (initial test condition becomes IF( ITER.GE."6*N" ) THEN … ) it can be reduced to N additional comparisons and 2N additions, but I don’t know how to Fortran it - please help…

This bug reappears from time to time (e.g. http://icl.cs.utk.edu/lapack-forum/archives/lapack/msg01294.html , numpy/numpy#1588 ), and e.g. currently OpenBLAS uses "-fdefault-integer-8" only when BINARY64 & INTERFACE64 ( https://github.com/xianyi/OpenBLAS/blob/develop/cmake/fc.cmake ), so apparently "-fdefault-integer-8" is too easy to forget to be a solution.

Best regards,

Cezary Dendek

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions