Open
Description
Every GEMM correctness test I've been able to run gets bad results immediately at index (0, 0). Error is very large, so it is likely the result of triggering some kind of undefined behavior. SYMM, SYRK, and TRMM, and TRSM also seem to fail (although not always).
Has anyone else tried testing running on the Intel runtime?