Fix using BLAS for all compatible cases of memory layout#1419
Merged
Conversation
4e25c2c to
248109d
Compare
be22336 to
5c8b9de
Compare
Lost in the recent workspace refactor.
We compute A B -> C with matrices A, B, C With the blas (cblas) interface it supports matrices that adhere to certain criteria. They should be contiguous on one dimension (stride=1). We glance a little at how numpy does this to try to catch all cases. In short, we accept A, B contiguous on either axis (row or column major). We use the case where C is (weakly) row major, but if it is column major we transpose A, B, C => A^t, B^t, C^t so that we are back to the C row major case. (Weakly = contiguous with stride=1 on that inner dimension, but stride for the other dimension can be larger; to differentiate from strictly whole array contiguous.) Minor change to the gemv function, no functional change, only updating due to the refactoring of blas layout functions. Fixes #1278
If we have a matrix of dimension say 5 x 5, BLAS requires the leading stride to be >= 5. Smaller cases are possible for read-only array views in ndarray(broadcasting and custom strides). In this case we mark the array as not BLAS compatible
Using cblas we can simplify this further to a more satisfying translation (from ndarray to BLAS), much simpler logic. Avoids creating and handling an extra layer of array views.
Add a crate with a mock blas implementation, so that we can assert that cblas_sgemm etc are called (depending on memory layout).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
With the blas (cblas) interface it supports matrices that adhere to certain
criteria. They should be contiguous on one dimension (stride=1).
We glance a little at how numpy does this to try to catch all cases.
Compute A B -> C:
We require for BLAS compatibility that: A, B, C are
"weakly" contiguous (stride=1) in their fastest dimension, but it can be
either first or second axis (either rowmajor/"c" or colmajor/"f").
The "normal case" is CblasRowMajor for cblas. Select CblasRowMajor /
CblasColMajor to fit C's memory order.
Apply transpose to A, B as needed if they differ from row major. If C
is CblasColMajor then transpose both A, B (again!)
(Weakly = contiguous with stride=1 on that fastest axis, but stride for the
other axis can be arbitrary large; to differentiate from strictly whole
array contiguous.)
A first commit simplified and corrected the logic, while still using
ndarray's reversed axes. But a further commit simplified it even further, to
a satisfying little function in
mat_mul_implas the final result.I have kept both states (both commits) because I think the first version is
a useful guide if we would ever go to use plain BLAS instead of CBLAS(?).
Fixes #1278