Banded matrices required buffer size calculated incorrectly (GBMV, HBMV, SBMV & TBMV) #538

BadgerKing7 · 2024-04-12T16:56:03Z

GBMV, HBMV and SBMV use MatVec directly, passing their own buffers into the function, unlike TBMV which passes a scratch buffer for just vector X into MatVec. This means that for all 4 of these operations TestMatrixA is applied directly on the user provided buffer for the A matrix.

The required buffer size of all A matrices is calculated as (ld * (two - 1) + one + offset) * sizeof(T) which for GBMV, HBMV, SBMV and TBMV called with row major matrices expands to (a_ld * (m - 1) + kl + ku + 1 + a_offset) * sizeof(T). However, the BLAS interface defines the size of matrix A for all 4 routines as a_ld * n which means that user buffers can lead to kInsufficientMemoryA errors even though they hold enough memory according to the interface spec.

For HBMV, SBMV and TBMV this issue is somewhat hidden by passing the value of n into MatVec's m parameter so the error is only encountered when (kl + ku + 1) > a_ld.

The text was updated successfully, but these errors were encountered:

CNugteren · 2024-05-15T17:48:10Z

Thank you for reporting this issue, and apologies for the late reply. I think I understand most of what you are saying, but I'm not really sure what you mean with the BLAS interface and the interface spec. Do you refer to something inside CLBlast, or something like the Netlib BLAS specification or so (e.g. this page)?

BTW, I'm open to reviewing fixes for this issue if you have something in mind.

BadgerKing7 · 2024-05-16T14:29:32Z

Do you refer to something inside CLBlast, or something like the Netlib BLAS specification or so (e.g. this page)?

I apologize for the lack of clarity, I was referring to the Netlib documentation to which you linked. On that page, the relevant snippet is:

BTW, I'm open to reviewing fixes for this issue if you have something in mind.

I will be looking for a fix and opening a pull request when I have something worth reviewing, but I cannot promise a timeframe since doing it without affecting other level 2 operations seems to be harder than appears at first look because the MatVec function is the one that needs changing. I have tried only changing input validation logic to allow for buffers of strictly a_ld * n to pass but doing this results in incorrect outputs afterwards. I will update when I have a way for you to reproduce this behavior.

CNugteren · 2024-05-28T18:31:40Z

seems to be harder than appears at first look because the MatVec function is the one that needs changing

What about adding an extra argument to optionally disable the matrix A check in MatVec and simply add an extra check in the relevant routines? Something along this patch:

diff --git a/src/routines/level2/xgbmv.cpp b/src/routines/level2/xgbmv.cpp
index e80b9a9..c37ba10 100644
--- a/src/routines/level2/xgbmv.cpp
+++ b/src/routines/level2/xgbmv.cpp
@@ -42,6 +42,9 @@ void Xgbmv<T>::DoGbmv(const Layout layout, const Transpose a_transpose,
   auto kl_real = (rotated) ? ku : kl;
   auto ku_real = (rotated) ? kl : ku;
 
+  // The matrix A has different constraints compared to what is normally tested in MatVec below
+  TestMatrixBanded(n, kl, ku, a_buffer, a_offset, a_ld);
+
   // Runs the generic matrix-vector multiplication, disabling the use of fast vectorized kernels.
   // The specific hermitian matrix-accesses are implemented in the kernel guarded by the
   // ROUTINE_GBMV define.
@@ -52,7 +55,8 @@ void Xgbmv<T>::DoGbmv(const Layout layout, const Transpose a_transpose,
          x_buffer, x_offset, x_inc, beta,
          y_buffer, y_offset, y_inc,
          fast_kernels, fast_kernels,
-         0, false, kl_real, ku_real);
+         0, false, kl_real, ku_real,
+         /*do_test_matrix_a=*/false);
 }
 
 // =================================================================================================
diff --git a/src/routines/level2/xgemv.cpp b/src/routines/level2/xgemv.cpp
index 63dab9f..9a5761c 100644
--- a/src/routines/level2/xgemv.cpp
+++ b/src/routines/level2/xgemv.cpp
@@ -64,7 +64,7 @@ void Xgemv<T>::MatVec(const Layout layout, const Transpose a_transpose,
                       const Buffer<T> &y_buffer, const size_t y_offset, const size_t y_inc,
                       bool fast_kernel, bool fast_kernel_rot,
                       const size_t parameter, const bool packed,
-                      const size_t kl, const size_t ku) {
+                      const size_t kl, const size_t ku, const bool do_test_matrix_a) {
 
   // Makes sure all dimensions are larger than zero
   if (m == 0 || n == 0) { throw BLASError(StatusCode::kInvalidDimension); }
@@ -92,7 +92,7 @@ void Xgemv<T>::MatVec(const Layout layout, const Transpose a_transpose,
 
   // Tests the matrix and the vectors for validity
   if (packed) { TestMatrixAP(n, a_buffer, a_offset); }
-  else { TestMatrixA(a_one, a_two, a_buffer, a_offset, a_ld); }
+  else if (do_test_matrix_a) { TestMatrixA(a_one, a_two, a_buffer, a_offset, a_ld); }
   TestVectorX(n_real, x_buffer, x_offset, x_inc);
   TestVectorY(m_real, y_buffer, y_offset, y_inc);
 
diff --git a/src/routines/level2/xgemv.hpp b/src/routines/level2/xgemv.hpp
index 1e1fa72..1ba5dde 100644
--- a/src/routines/level2/xgemv.hpp
+++ b/src/routines/level2/xgemv.hpp
@@ -46,7 +46,7 @@ class Xgemv: public Routine {
               const Buffer<T> &y_buffer, const size_t y_offset, const size_t y_inc,
               bool fast_kernel, bool fast_kernel_rot,
               const size_t parameter, const bool packed,
-              const size_t kl, const size_t ku);
+              const size_t kl, const size_t ku, const bool do_test_matrix_a = true);
 };
 
 // =================================================================================================
diff --git a/src/utilities/buffer_test.hpp b/src/utilities/buffer_test.hpp
index 4a2a2c9..9097a21 100644
--- a/src/utilities/buffer_test.hpp
+++ b/src/utilities/buffer_test.hpp
@@ -65,6 +65,17 @@ void TestMatrixAP(const size_t n, const Buffer<T> &buffer, const size_t offset)
   } catch (const Error<std::runtime_error> &e) { throw BLASError(StatusCode::kInvalidMatrixA, e.what()); }
 }
 
+// Tests matrix 'A' (for banded matrix-vector computations) for validity
+template <typename T>
+void TestMatrixBanded(const size_t n, const size_t kl, const size_t ku, const Buffer<T> &buffer,
+                      const size_t offset, const size_t ld, const bool test_lead_dim = true) {
+  if (test_lead_dim && ld < kl + ku) { throw BLASError(StatusCode::kInvalidLeadDimA); }
+  try {
+    const auto required_size = (ld * n + offset) * sizeof(T);
+    if (buffer.GetSize() < required_size) { throw BLASError(StatusCode::kInsufficientMemoryA); }
+  } catch (const Error<std::runtime_error> &e) { throw BLASError(StatusCode::kInvalidMatrixA, e.what()); }
+}
+
 // =================================================================================================
 
 // Tests vector 'X' for validity

andreimatraguna · 2024-08-28T12:12:53Z

Hi CNugteren,
this is a good idea, I implemented what you proposed, can you check if its all good? commit SHA 864cb77

CNugteren · 2024-08-28T12:18:52Z

Can you make a PR to this repository? And then also ask @BadgerKing7 for a review.

andreimatraguna · 2024-08-28T12:35:07Z

CNugteren, can u give me access to publish the branch? (I asked @BadgerKing7, his waiting to review my changes after I create the PR)

CNugteren · 2024-08-28T13:39:51Z

The normal flow to make a PR to a repository you don't have write permissions to (such as CLBlast) is to fork the repository, make your changes (possibly on a branch), push your changes, and then make a PR to this repository from your fork. If you need help, I'm sure there's a guide about this somewhere on GitHub.

andreimatraguna · 2024-08-28T13:51:40Z

@BadgerKing7 can u look at this PR #559

BadgerKing7 changed the title ~~Banded matrices required buffer size calculated incorrectly (GBMV, HBMV & SBMV)~~ Banded matrices required buffer size calculated incorrectly (GBMV, HBMV, SBMV & TBMV) Apr 12, 2024

CNugteren added the correctness label Apr 15, 2024

CNugteren linked a pull request Aug 29, 2024 that will close this issue

add check for test_matrix_a in gbmv,tbmv,hbmv and sbmv #559

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Banded matrices required buffer size calculated incorrectly (GBMV, HBMV, SBMV & TBMV) #538

Banded matrices required buffer size calculated incorrectly (GBMV, HBMV, SBMV & TBMV) #538

BadgerKing7 commented Apr 12, 2024 •

edited

Loading

CNugteren commented May 15, 2024

BadgerKing7 commented May 16, 2024 •

edited

Loading

CNugteren commented May 28, 2024

andreimatraguna commented Aug 28, 2024

CNugteren commented Aug 28, 2024

andreimatraguna commented Aug 28, 2024

CNugteren commented Aug 28, 2024

andreimatraguna commented Aug 28, 2024

Banded matrices required buffer size calculated incorrectly (GBMV, HBMV, SBMV & TBMV) #538

Banded matrices required buffer size calculated incorrectly (GBMV, HBMV, SBMV & TBMV) #538

Comments

BadgerKing7 commented Apr 12, 2024 • edited Loading

CNugteren commented May 15, 2024

BadgerKing7 commented May 16, 2024 • edited Loading

CNugteren commented May 28, 2024

andreimatraguna commented Aug 28, 2024

CNugteren commented Aug 28, 2024

andreimatraguna commented Aug 28, 2024

CNugteren commented Aug 28, 2024

andreimatraguna commented Aug 28, 2024

BadgerKing7 commented Apr 12, 2024 •

edited

Loading

BadgerKing7 commented May 16, 2024 •

edited

Loading