Description
Preamble
The descriptions for ?GEMM start as:
DGEMM performs one of the matrix-matrix operations
C := alpha*op( A ) * op( B ) + beta*C
Assuming this is computed using IEEE rules, if alpha
is zero and A
or B
contain a NaN
or +/-Inf
, then the result should contain NaN
s, as 0.0 * Inf == NaN
and 0.0 * NaN== NaN
. The same goes if beta
is zero and C
contains a NaN
or +/-Inf
.
Issue 1
As far as I can tell from a cursory look, this is not how ?GEMM actually behaves in the Netlib implementation.
- Due to the early return at line 271 of DGEMM, if
alpha
is zero andbeta
is one,NaN
s orInf
s inA
orB
do not affect the result. - If both
alpha
andbeta
are zero, then a zero matrix is returned (line 279), irrespective of anyNaN
s orInf
s inA
,B
orC
. - If
alpha
is zero andbeta
is neither zero or one, thenbeta*C
is returned, irrespective of anyNaN
s orInf
s inA
orB
. - If only
beta
is zero,alpha*op( A )*op( B )
is returned, irrespective of anyNaN
s orInf
s inC
.
Therefore there are inconsistencies between the behavior expected from reading the docs and actual program behavior. These are arguably corner cases, but NaN
propagation is an important mechanism, and callers should be made aware if a given subroutine ignores NaN
inputs under certain circumstances.
One example of this mattering, is when one of the matrices may contain uninitialized data, leading to Issue#2
Issue 2
The documentation does not make it clear enough when it is OK to call ?GEMM with matrices containing uninitialized data.
For example, the simplest use case for ?GEMM is computing C = A*B
. The current documentation states that:
When BETA is supplied as zero then C need not be set on input.
and
Before entry, the leading m by n part of the array C must contain the matrix C, except when beta is zero, in which case C need not be set on entry.
However depending on a person's interpretation of what it means for C
to be "set", this may fail to convey that if beta
is zero, then C
is permitted to be contain arbitrary bytes (including ones that resolve to NaN
s or Inf
s) before entry. As a result, callers may unnecessarily zero-fill C
before calling ?GEMM.
The same is also true for the other corner cases where alpha
is zero, where both A
and B
are allowed to be filled with arbitrary bytes before entry without any effect on the output. Since the current docs do not mention any special behavior for alpha == 0.0
, one would incorrectly assume that A
and B
have to be initialized to guarantee correct results, even if alpha
is zero.
Suggested fixes
- Change the docs to point out that ?GEMM does not compute
C := alpha*op( A )*op( B ) + beta*C
exactly as written, and as a consequence does not adhere to all of theNaN
andInf
propagation rules that the formula implies. - Document the corner cases.
- Add notes about when the matrices are permitted to contain arbitrary bytes before entry.
Checklist
- I've included a minimal example to reproduce the issue
- I'd be willing to make a PR to solve this issue