-
Notifications
You must be signed in to change notification settings - Fork 34
Description
I wanted to use the integer gemm code from 0430cf0, and realized that there currently is no way of performing an operation on transposed matrices while I wanted to perform A^t A. In the BLAS context, the transpose or complex conjugate of a matrix is usually expressed as Op(A), where Op is expressed through the parameter TRANSA given by the character 'N', 'T', or 'C'.
I realize that since we actually have dimensions and slices as part of our matrix ArrayBase structures, we can just circumvent the issue by doing a transpose of the matrix view via fn t(mut self). The questions are:
- Performance: does code specific to a transposed matrix with unchanged memory layout have the same performance as generic code given different stride information?
- In how far is the gemm kernel for transposed matrices different from that for non-transposed matrices?
While addressing this issue, it's probably also worth investigating how DSYRK for the specific case of A^t A is implemented different in the BLIS library.
I have a hard time understanding how BLIS defines its kernels, specifically how the different cases of Op(A) Op(B) are implemented. I am happy do dig in and write a benchmark comparing the current ndarray approach to writing specific kernel. Can you point me to the right spot to look at?