Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Banded matrices required buffer size calculated incorrectly (GBMV, HBMV, SBMV & TBMV) #538

Open
BadgerKing7 opened this issue Apr 12, 2024 · 8 comments · May be fixed by #559
Open

Banded matrices required buffer size calculated incorrectly (GBMV, HBMV, SBMV & TBMV) #538

BadgerKing7 opened this issue Apr 12, 2024 · 8 comments · May be fixed by #559

Comments

@BadgerKing7
Copy link

BadgerKing7 commented Apr 12, 2024

GBMV, HBMV and SBMV use MatVec directly, passing their own buffers into the function, unlike TBMV which passes a scratch buffer for just vector X into MatVec. This means that for all 4 of these operations TestMatrixA is applied directly on the user provided buffer for the A matrix.

The required buffer size of all A matrices is calculated as (ld * (two - 1) + one + offset) * sizeof(T) which for GBMV, HBMV, SBMV and TBMV called with row major matrices expands to (a_ld * (m - 1) + kl + ku + 1 + a_offset) * sizeof(T). However, the BLAS interface defines the size of matrix A for all 4 routines as a_ld * n which means that user buffers can lead to kInsufficientMemoryA errors even though they hold enough memory according to the interface spec.

For HBMV, SBMV and TBMV this issue is somewhat hidden by passing the value of n into MatVec's m parameter so the error is only encountered when (kl + ku + 1) > a_ld.

@BadgerKing7 BadgerKing7 changed the title Banded matrices required buffer size calculated incorrectly (GBMV, HBMV & SBMV) Banded matrices required buffer size calculated incorrectly (GBMV, HBMV, SBMV & TBMV) Apr 12, 2024
@CNugteren
Copy link
Owner

Thank you for reporting this issue, and apologies for the late reply. I think I understand most of what you are saying, but I'm not really sure what you mean with the BLAS interface and the interface spec. Do you refer to something inside CLBlast, or something like the Netlib BLAS specification or so (e.g. this page)?

BTW, I'm open to reviewing fixes for this issue if you have something in mind.

@BadgerKing7
Copy link
Author

BadgerKing7 commented May 16, 2024

Do you refer to something inside CLBlast, or something like the Netlib BLAS specification or so (e.g. this page)?

I apologize for the lack of clarity, I was referring to the Netlib documentation to which you linked. On that page, the relevant snippet is: image

BTW, I'm open to reviewing fixes for this issue if you have something in mind.

I will be looking for a fix and opening a pull request when I have something worth reviewing, but I cannot promise a timeframe since doing it without affecting other level 2 operations seems to be harder than appears at first look because the MatVec function is the one that needs changing. I have tried only changing input validation logic to allow for buffers of strictly a_ld * n to pass but doing this results in incorrect outputs afterwards. I will update when I have a way for you to reproduce this behavior.

@CNugteren
Copy link
Owner

seems to be harder than appears at first look because the MatVec function is the one that needs changing

What about adding an extra argument to optionally disable the matrix A check in MatVec and simply add an extra check in the relevant routines? Something along this patch:

diff --git a/src/routines/level2/xgbmv.cpp b/src/routines/level2/xgbmv.cpp
index e80b9a9..c37ba10 100644
--- a/src/routines/level2/xgbmv.cpp
+++ b/src/routines/level2/xgbmv.cpp
@@ -42,6 +42,9 @@ void Xgbmv<T>::DoGbmv(const Layout layout, const Transpose a_transpose,
   auto kl_real = (rotated) ? ku : kl;
   auto ku_real = (rotated) ? kl : ku;
 
+  // The matrix A has different constraints compared to what is normally tested in MatVec below
+  TestMatrixBanded(n, kl, ku, a_buffer, a_offset, a_ld);
+
   // Runs the generic matrix-vector multiplication, disabling the use of fast vectorized kernels.
   // The specific hermitian matrix-accesses are implemented in the kernel guarded by the
   // ROUTINE_GBMV define.
@@ -52,7 +55,8 @@ void Xgbmv<T>::DoGbmv(const Layout layout, const Transpose a_transpose,
          x_buffer, x_offset, x_inc, beta,
          y_buffer, y_offset, y_inc,
          fast_kernels, fast_kernels,
-         0, false, kl_real, ku_real);
+         0, false, kl_real, ku_real,
+         /*do_test_matrix_a=*/false);
 }
 
 // =================================================================================================
diff --git a/src/routines/level2/xgemv.cpp b/src/routines/level2/xgemv.cpp
index 63dab9f..9a5761c 100644
--- a/src/routines/level2/xgemv.cpp
+++ b/src/routines/level2/xgemv.cpp
@@ -64,7 +64,7 @@ void Xgemv<T>::MatVec(const Layout layout, const Transpose a_transpose,
                       const Buffer<T> &y_buffer, const size_t y_offset, const size_t y_inc,
                       bool fast_kernel, bool fast_kernel_rot,
                       const size_t parameter, const bool packed,
-                      const size_t kl, const size_t ku) {
+                      const size_t kl, const size_t ku, const bool do_test_matrix_a) {
 
   // Makes sure all dimensions are larger than zero
   if (m == 0 || n == 0) { throw BLASError(StatusCode::kInvalidDimension); }
@@ -92,7 +92,7 @@ void Xgemv<T>::MatVec(const Layout layout, const Transpose a_transpose,
 
   // Tests the matrix and the vectors for validity
   if (packed) { TestMatrixAP(n, a_buffer, a_offset); }
-  else { TestMatrixA(a_one, a_two, a_buffer, a_offset, a_ld); }
+  else if (do_test_matrix_a) { TestMatrixA(a_one, a_two, a_buffer, a_offset, a_ld); }
   TestVectorX(n_real, x_buffer, x_offset, x_inc);
   TestVectorY(m_real, y_buffer, y_offset, y_inc);
 
diff --git a/src/routines/level2/xgemv.hpp b/src/routines/level2/xgemv.hpp
index 1e1fa72..1ba5dde 100644
--- a/src/routines/level2/xgemv.hpp
+++ b/src/routines/level2/xgemv.hpp
@@ -46,7 +46,7 @@ class Xgemv: public Routine {
               const Buffer<T> &y_buffer, const size_t y_offset, const size_t y_inc,
               bool fast_kernel, bool fast_kernel_rot,
               const size_t parameter, const bool packed,
-              const size_t kl, const size_t ku);
+              const size_t kl, const size_t ku, const bool do_test_matrix_a = true);
 };
 
 // =================================================================================================
diff --git a/src/utilities/buffer_test.hpp b/src/utilities/buffer_test.hpp
index 4a2a2c9..9097a21 100644
--- a/src/utilities/buffer_test.hpp
+++ b/src/utilities/buffer_test.hpp
@@ -65,6 +65,17 @@ void TestMatrixAP(const size_t n, const Buffer<T> &buffer, const size_t offset)
   } catch (const Error<std::runtime_error> &e) { throw BLASError(StatusCode::kInvalidMatrixA, e.what()); }
 }
 
+// Tests matrix 'A' (for banded matrix-vector computations) for validity
+template <typename T>
+void TestMatrixBanded(const size_t n, const size_t kl, const size_t ku, const Buffer<T> &buffer,
+                      const size_t offset, const size_t ld, const bool test_lead_dim = true) {
+  if (test_lead_dim && ld < kl + ku) { throw BLASError(StatusCode::kInvalidLeadDimA); }
+  try {
+    const auto required_size = (ld * n + offset) * sizeof(T);
+    if (buffer.GetSize() < required_size) { throw BLASError(StatusCode::kInsufficientMemoryA); }
+  } catch (const Error<std::runtime_error> &e) { throw BLASError(StatusCode::kInvalidMatrixA, e.what()); }
+}
+
 // =================================================================================================
 
 // Tests vector 'X' for validity

@andreimatraguna
Copy link

Hi CNugteren,
this is a good idea, I implemented what you proposed, can you check if its all good? commit SHA 864cb77

@CNugteren
Copy link
Owner

Can you make a PR to this repository? And then also ask @BadgerKing7 for a review.

@andreimatraguna
Copy link

CNugteren, can u give me access to publish the branch? (I asked @BadgerKing7, his waiting to review my changes after I create the PR)

@CNugteren
Copy link
Owner

The normal flow to make a PR to a repository you don't have write permissions to (such as CLBlast) is to fork the repository, make your changes (possibly on a branch), push your changes, and then make a PR to this repository from your fork. If you need help, I'm sure there's a guide about this somewhere on GitHub.

@andreimatraguna
Copy link

@BadgerKing7 can u look at this PR #559

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants