-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hadamard product? #1083
Comments
Does not exist currently but sounds interesting. Would you happen to have some rough number for the speedup from using MKL compared to just letting a modern compiler optimize a naive implementation ? |
It used to be in BLAS.... Still in lapack 3.1.1 |
@brada4 the only trace of this function appears to be a spurious entry in the intro_blas1 manpage distributed with lapack 3.1.1 |
Just learned there is a walkaround by using sbmv Elementwise matrix*matrix product would still be nice... |
Not sure abusing sbmv for this would be more efficient than a simple loop over all elements ? BTW the only references for *HAD functions in cuBLAS appear to be people asking if there are any... |
Now imagine your sbmv multiplying 2 million-element vectors ... |
Please explain... Do you extract the diagonal from the resulting matrix somehow? How is it less wasteful? |
Diagonal? It is not square. |
Not sure I get the gemm/gemv idea either, but it could be instructive to see a quick benchmark comparison of that sbmv suggestion from stackoverflow against a simple loop that the compiler can unroll as needed. |
The only way I see with gemv/gemm is to multiply two vectors v1T * v2 (of the same size), which returns a square matrix, and the diagonal is actually the hadamard product. sbmv allows the input matrix to be stored in banded matrix form. With number of super-diagonals = 0 you only need to supply the diagonal, which in our case is one of the vectors, so it's not so bad... I should explain the context to my problem: |
swap dimensions of v2 and get HAD |
The Hadamard product is obviously a memory bounded operation, so the only thing one had to take care of is that the matrices are not traversed orthogonal to their storage scheme. If the matrices are accesses in the right way one has 3 loads, 1 multiplication and one store. The only point I see where one can optimize something is if we assume $ C := \alpha A\circle B + \beta C $ with |
in common case one can treat matrices as 1:(M*N) vectors and apply marginal case of gemm / gemv |
I still don't see it. Please give us a concrete example |
FUNCTION DHAD2(N,A,B,C) |
dgemv "N", n, 1, 1.0, v1, n, v2, 1, 1.0, res, 1 |
I think we can do hadamard products using cblas_dtbmv(CblasColMajor, CblasUpper, CblasNoTrans, CblasNonUnit, n, 0, p, 1, q, 1); This code does hadamard product of vectors |
Sorry, I recommended to use |
@personalityson certainly an unexpected but interesting use case. |
Btw, the stdcall requirement normally needed for dlls to work in VBA only applies to 32bit dlls |
Looking at the declaration of xHAD which has alpha and beta multipliers, incx, incy and even incz for a sparse result vector, sbmv/tbmv actually could be a good choice. The MKL-style vxMul on the other hand appears to support only the simple multiplication of N elements of vectors x and y to yield vector z. |
Hello everybody, I am looking forward this function too! My program does a video tracking task like this OpenCV tracking algorithm. The algorithm does a lot of point wise complex product in Fourier domain. The implementation in OpenCV is cv::mulSpectrum() while it is the bottle neck. My program will run on multi platform such as a x86 and ARM. OpenBLAS is my first choice from consideration of cross-platform and high performance. However, it is a pity to lack point wise vector product. Eigen and MKL do have this function. |
You can accelerate parts of Eigen using BLAS (OpenBLAS, MKL, Accelerate Framework) |
Hi folks, I don't have anything to add other than to second @loadwiki , this would be a useful addition. |
It is in all viable macro libraries, memory bound, there wont be any improvement in speed adding it here over generic BLAS. I'd bet on modern compilers actually getting totally naive loop reasonably vectorised. |
err, not sure I'd want to underwrite that... |
Non-destructive AXPY that is 3 memory cells per MUL in place of original 2 .... |
Elementwise vector product (like dot, but without the reduction to one number)
Does it exist?
In MKL its vdMul, in cuBLAS its DHAD (so, not part of the standard BLAS function list)...
Hadamard product is completely central in machine learning, and is the only reason which prevents me from using OpenBLAS...
The text was updated successfully, but these errors were encountered: