Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hotfix : Add a note about thread oversubscription in AOCL #1472

Merged
merged 2 commits into from
Aug 29, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions docs/how-to/Programmers_Guide.rst
Original file line number Diff line number Diff line change
Expand Up @@ -900,6 +900,12 @@ There are three client executables that can be used with rocBLAS. They are:

These three clients can be built by following the instructions in the Building and Installing section of the User Guide. After building the rocBLAS clients, they can be found in the directory ``rocBLAS/build/release/clients/staging``.

.. note::
The ``rocblas-bench`` and ``rocblas-test`` executables use AMD's ILP64 version of AOCL-BLAS 4.2 as the host reference BLAS to verify correctness. However, there is a known issue with AOCL-BLAS that can cause these executables to hang. This problem can arise because the AOCL-BLAS library launches multiple threads to perform computations. If the number of threads matches the total number of CPU logical cores, it can lead to thread oversubscription, causing the program to hang.
To prevent this issue, we recommend limiting the number of threads that the AOCL-BLAS library uses to fewer than the available CPU cores. You can do this by setting the ``OMP_NUM_THREADS`` environment variable.

For example, on a server with 32 cores, you can limit the number of threads to 28 by setting ``export OMP_NUM_THREADS=28``

The next three sections will provide a brief explanation and the usage of each rocBLAS client.

rocblas-bench
Expand Down