You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I run ClBlast Hgemm and Hgemv on qualcomm adreno 730 GPU, the matrix shape is M = 8, N = 3200, K = 3200.
Hgemm took 8.05401 ms(average of 1000 times run) to finish, which is much slower than running 8 times Hgemv in a for loops(0.500931 ms per loop, total about 4 ms).
Here is code calling gemm and gemv: CLBlastHgemm(CLBlastLayout::CLBlastLayoutRowMajor, CLBlastTranspose::CLBlastTransposeNo, CLBlastTranspose::CLBlastTransposeNo, M, N, K, alpha, A_mat(), 0, K, B_mat(), 0, N, beta, C_mat(), 0, N, &command_queue(), nullptr);
CLBlastHgemv(CLBlastLayout::CLBlastLayoutRowMajor, CLBlastTranspose::CLBlastTransposeYes, K, N, alpha, B_mat(), b_offset * i, N, A_mat(), 0, 1, beta, C_mat(), 0, 1, &command_queue(), nullptr);
The text was updated successfully, but these errors were encountered:
Thank you for reporting this. Yes, it can be that for weird shapes running GEMV multiple times is faster than running GEMM once. I'm not sure there is much we can do here, since the optimization space is vast.
You should try to run the tuner specifically for these matrix sizes, but most likely it won't be faster than your current GEMV solution.
Hi, I run ClBlast Hgemm and Hgemv on qualcomm adreno 730 GPU, the matrix shape is M = 8, N = 3200, K = 3200.
Hgemm took 8.05401 ms(average of 1000 times run) to finish, which is much slower than running 8 times Hgemv in a for loops(0.500931 ms per loop, total about 4 ms).
Here is code calling gemm and gemv:
CLBlastHgemm(CLBlastLayout::CLBlastLayoutRowMajor, CLBlastTranspose::CLBlastTransposeNo, CLBlastTranspose::CLBlastTransposeNo, M, N, K, alpha, A_mat(), 0, K, B_mat(), 0, N, beta, C_mat(), 0, N, &command_queue(), nullptr);
CLBlastHgemv(CLBlastLayout::CLBlastLayoutRowMajor, CLBlastTranspose::CLBlastTransposeYes, K, N, alpha, B_mat(), b_offset * i, N, A_mat(), 0, 1, beta, C_mat(), 0, 1, &command_queue(), nullptr);
The text was updated successfully, but these errors were encountered: