Skip to content

Fix WORK strides in P?LARZ ?AXPY calls#161

Merged
langou merged 1 commit into
Reference-ScaLAPACK:masterfrom
kyungminlee:fix-larz-daxpy-stride
May 5, 2026
Merged

Fix WORK strides in P?LARZ ?AXPY calls#161
langou merged 1 commit into
Reference-ScaLAPACK:masterfrom
kyungminlee:fix-larz-daxpy-stride

Conversation

@kyungminlee
Copy link
Copy Markdown
Contributor

Fix incorrect BLAS vector increments in P?LARZ left-side ?AXPY calls.

WORK(IPW) stores a contiguous local work vector, so the ?AXPY increment must be 1. The previous calls used MAX(1,NQC2), which is appropriate as a leading dimension for matrix-style BLACS/LAPACK calls but incorrect as a BLAS vector stride.

  • Updated left-side ?AXPY calls that read/write WORK or WORK(IPW) to use increment 1.
  • Kept C increments unchanged:
    • LDC when traversing a row of local C
    • 1 when traversing a column of local C

For comparison, LAPACK's DLARZ uses INCX=1 in the analogous DAXPY:

DAXPY( N, -TAU, WORK, 1, C, LDC )

The value MAX(1, NQC2) does belong nearby — it is the legitimate LDA of the surrounding DLASET / DGSUM2D matrix-shaped calls. The bug looks like a copy-paste of that LDA into the DAXPY's stride slot.

The previous stride could skip elements in WORK and potentially access out-of-bounds memory when NQC2 > 1. This fixes the local Householder update path without changing the algorithm.

Fix incorrect BLAS vector increments in `P?LARZ` left-side `?AXPY` calls.

`WORK(IPW)` stores a contiguous local work vector, so the `?AXPY` increment must be `1`. The previous calls used `MAX(1,NQC2)`, which is appropriate as a leading dimension for matrix-style BLACS/LAPACK calls but incorrect as a BLAS vector stride.

- Updated left-side `?AXPY` calls that read/write `WORK` or `WORK(IPW)` to use increment `1`.
- Kept `C` increments unchanged:
  - `LDC` when traversing a row of local `C`
  - `1` when traversing a column of local `C`

For comparison, LAPACK's DLARZ uses INCX=1 in the analogous DAXPY:

    DAXPY( N, -TAU, WORK, 1, C, LDC )

The value MAX(1, NQC2) does belong nearby — it is the legitimate LDA of the surrounding DLASET / DGSUM2D matrix-shaped calls. The bug looks like a copy-paste of that LDA into the DAXPY's stride slot.

The previous stride could skip elements in `WORK` and potentially access out-of-bounds memory when `NQC2 > 1`. This fixes the local Householder update path without changing the algorithm.
@langou langou merged commit f7edd05 into Reference-ScaLAPACK:master May 5, 2026
12 checks passed
@kyungminlee kyungminlee deleted the fix-larz-daxpy-stride branch May 5, 2026 08:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants