Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update VERSION and hip_runtime_api.h for 6.3.2 #3731

Closed
wants to merge 46 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
77be83b
update rocm-docs-core to 1.12.0
Dec 19, 2024
7e2e9ae
Sync rocm-6.3.x into docs/6.3.1
alexxu-amd Dec 20, 2024
f1e207a
SWDEV-490062 - Update documentation
neon60 Oct 21, 2024
b42e1a2
SWDEV-497324 - Rename CLR as Compute Language Runtime
jujiang-del Nov 8, 2024
4205146
Update deprecated API list
neon60 Oct 23, 2024
c6a4b7d
Correct Porting Guide Macros
harkgill-amd Oct 9, 2024
2f9a4c5
Add what is HIP
neon60 Oct 24, 2024
6f60f76
Update FAQ
MKKnorr Oct 30, 2024
1c1160c
Docs: Fix broken refs
MKKnorr Oct 30, 2024
3196dce
Add HIP environment variables
neon60 Nov 5, 2024
f114e14
Fixing the what-is-hip links
neon60 Nov 10, 2024
c66f7ba
Fix deprecated API list
neon60 Nov 13, 2024
2d78ee1
Add ROCm 6.3 deprecation
neon60 Nov 13, 2024
dc3516b
Reorganize HIP runtime API how-to
neon60 Nov 18, 2024
c0f303d
PR feedback.
neon60 Nov 18, 2024
8ceb50c
Add initialization and error handling
neon60 Nov 18, 2024
e30dbe7
Documentation: Add hardware capabilities page
parbenc Aug 14, 2024
68c1c52
Rebase of docs/develop fix
neon60 Nov 19, 2024
3989235
Fix doc links and fix spelling
neon60 Nov 18, 2024
84b8fcc
Fix links
neon60 Nov 19, 2024
f430bc3
Add call stack management
neon60 Nov 20, 2024
26b8a96
Add external interop page
neon60 Nov 20, 2024
b34ad54
Add multi device management
neon60 Nov 22, 2024
3c6f75b
Landing page update
neon60 Nov 22, 2024
82171ef
Add NVCC inability to compile .hip files notice and workaround
Nov 20, 2024
35f567f
Move warning to "hipcc detected my platform incorrectly" section
Nov 21, 2024
f368917
Fix programming guide TOC
neon60 Nov 23, 2024
77f0301
Minor fix
neon60 Nov 24, 2024
339bb4d
Update HIP runtime API in programming guide
neon60 Nov 25, 2024
541ad35
Docs: Update unified memory documentation
MKKnorr Nov 26, 2024
535dd77
Add OpenGL interop
neon60 Nov 28, 2024
a989227
[Docs] Revert the deprecation of hipHostMalloc/hipHostFree
neon60 Dec 3, 2024
11a3b90
Fix typo in FAQ
neon60 Dec 3, 2024
53bee8f
TOC minor fix
neon60 Dec 5, 2024
950e78d
Fix links
neon60 Dec 6, 2024
ce3a76e
Add note to README to point to HIP documentation.
randyh62 Dec 12, 2024
5981c14
Minor fixes
neon60 Dec 15, 2024
383d04e
Docs: Update device memory pages
neon60 Dec 19, 2024
f45b12b
Minor fix in unified memory page
neon60 Dec 6, 2024
9f592e0
Fix default device selection
matyas-streamhpc Dec 9, 2024
8ab60da
Docs: Refactor cpp_language_extensions and cpp_language_support
MKKnorr Nov 21, 2024
e74903c
Docs: Fix device memory refs
MKKnorr Jan 8, 2025
2c240ca
Remove cpp language extensions
neon60 Jan 9, 2025
4bb7864
Docs: Update example references
MKKnorr Jan 9, 2025
eb68b2e
Add asynchronous execution documentation page
neon60 Jan 14, 2025
f00fa86
update VERSION and hip_runtime_api.h for 6.3.2
randyh62 Jan 28, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Add initialization and error handling
  • Loading branch information
neon60 committed Dec 20, 2024
commit 8ceb50cab867065226df1d0dab5c681520b8d2a2
134 changes: 134 additions & 0 deletions docs/how-to/hip_runtime_api/error_handling.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
.. meta::
:description: Error Handling
:keywords: AMD, ROCm, HIP, error handling, error

********************************************************************************
Error handling
********************************************************************************

HIP provides functionality to detect, report, and manage errors that occur
during the execution of HIP runtime functions or when launching kernels. Every
HIP runtime function, apart from launching kernels, has :cpp:type:`hipError_t`
as return type. :cpp:func:`hipGetLastError()` and :cpp:func:`hipPeekAtLastError()`
can be used for catching errors from kernel launches, as kernel launches don't
return an error directly. HIP maintains an internal state, that includes the
last error code. :cpp:func:`hipGetLastError` returns and resets that error to
hipSuccess, while :cpp:func:`hipPeekAtLastError` just returns the error without
changing it. To get a human readable version of the errors,
:cpp:func:`hipGetErrorString()` and :cpp:func:`hipGetErrorName()` can be used.

.. note::

:cpp:func:`hipGetLastError` returns the returned error code of the last HIP
runtime API call even if it's hipSuccess, while ``cudaGetLastError`` returns
the error returned by any of the preceding CUDA APIs in the same host thread.
:cpp:func:`hipGetLastError` behavior will be matched with
``cudaGetLastError`` in ROCm release 7.0.

Best practices of HIP error handling:

1. Check errors after each API call - Avoid error propagation.
2. Use macros for error checking - Check :ref:`hip_check_macros`.
3. Handle errors gracefully - Free resources and provide meaningful error
messages to the user.

For more details on the error handling functions, see :ref:`error handling
functions reference page <error_handling_reference>`.

.. _hip_check_macros:

HIP check macros
================================================================================

HIP uses check macros to simplify error checking and reduce code duplication.
The ``HIP_CHECK`` macros are mainly used to detect and report errors. It can
also exit from application with ``exit(1);`` function call after the error
print. The ``HIP_CHECK`` macro example:

.. code-block:: cpp

#define HIP_CHECK(expression) \
{ \
const hipError_t status = expression; \
if(status != hipSuccess){ \
std::cerr << "HIP error " \
<< status << ": " \
<< hipGetErrorString(status) \
<< " at " << __FILE__ << ":" \
<< __LINE__ << std::endl; \
} \
}

Complete example
================================================================================

A complete example to demonstrate the error handling with a simple addition of
two values kernel:

.. code-block:: cpp

#include <hip/hip_runtime.h>
#include <vector>
#include <iostream>

#define HIP_CHECK(expression) \
{ \
const hipError_t status = expression; \
if(status != hipSuccess){ \
std::cerr << "HIP error " \
<< status << ": " \
<< hipGetErrorString(status) \
<< " at " << __FILE__ << ":" \
<< __LINE__ << std::endl; \
} \
}

// Addition of two values.
__global__ void add(int *a, int *b, int *c, size_t size) {
const size_t index = threadIdx.x + blockDim.x * blockIdx.x;
if(index < size) {
c[index] += a[index] + b[index];
}
}

int main() {
constexpr int numOfBlocks = 256;
constexpr int threadsPerBlock = 256;
constexpr size_t arraySize = 1U << 16;

std::vector<int> a(arraySize), b(arraySize), c(arraySize);
int *d_a, *d_b, *d_c;

// Setup input values.
std::fill(a.begin(), a.end(), 1);
std::fill(b.begin(), b.end(), 2);

// Allocate device copies of a, b and c.
HIP_CHECK(hipMalloc(&d_a, arraySize * sizeof(*d_a)));
HIP_CHECK(hipMalloc(&d_b, arraySize * sizeof(*d_b)));
HIP_CHECK(hipMalloc(&d_c, arraySize * sizeof(*d_c)));

// Copy input values to device.
HIP_CHECK(hipMemcpy(d_a, &a, arraySize * sizeof(*d_a), hipMemcpyHostToDevice));
HIP_CHECK(hipMemcpy(d_b, &b, arraySize * sizeof(*d_b), hipMemcpyHostToDevice));

// Launch add() kernel on GPU.
hipLaunchKernelGGL(add, dim3(numOfBlocks), dim3(threadsPerBlock), 0, 0, d_a, d_b, d_c, arraySize);
// Check the kernel launch
HIP_CHECK(hipGetLastError());
// Check for kernel execution error
HIP_CHECK(hipDeviceSynchronize());

// Copy the result back to the host.
HIP_CHECK(hipMemcpy(&c, d_c, arraySize * sizeof(*d_c), hipMemcpyDeviceToHost));

// Cleanup allocated memory.
HIP_CHECK(hipFree(d_a));
HIP_CHECK(hipFree(d_b));
HIP_CHECK(hipFree(d_c));

// Print the result.
std::cout << a[0] << " + " << b[0] << " = " << c[0] << std::endl;

return 0;
}
105 changes: 105 additions & 0 deletions docs/how-to/hip_runtime_api/initialization.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
.. meta::
:description: Initialization.
:keywords: AMD, ROCm, HIP, initialization

********************************************************************************
Initialization
********************************************************************************

The initialization involves setting up the environment and resources needed for
using GPUs. The following steps are covered with the initialization:

- Setting up the HIP runtime

This includes reading the environment variables set during init, setting up
the active or visible devices, loading necessary libraries, setting up
internal buffers for memory copies or cooperative launches, initialize the
compiler as well as HSA runtime and checks any errors due to lack of resources
or no active devices.

- Querying and setting GPUs

Identifying and querying the available GPU devices on the system.

- Setting up contexts

Creating contexts for each GPU device, which are essential for managing
resources and executing kernels. For further details, check the :ref:`context
section <context_driver_api>`.

Initialize the HIP runtime
================================================================================

The HIP runtime is initialized automatically when the first HIP API call is
made. However, you can explicitly initialize it using :cpp:func:`hipInit`,
to be able to control the timing of the initialization. The manual
initialization can be useful to ensure that the GPU is initialized and
ready, or to isolate GPU initialization time from other parts of
your program.

.. note::

You can use :cpp:func:`hipDeviceReset()` to delete all streams created, memory
allocated, kernels running and events created by the current process. Any new
HIP API call initializes the HIP runtime again.

Querying and setting GPUs
================================================================================

If multiple GPUs are available in the system, you can query and select the
desired GPU(s) to use based on device properties, such as size of global memory,
size shared memory per block, support of cooperative launch and support of
managed memory.

Querying GPUs
--------------------------------------------------------------------------------

The properties of a GPU can be queried using :cpp:func:`hipGetDeviceProperties`,
which returns a struct of :cpp:struct:`hipDeviceProp_t`. The properties in the struct can be
used to identify a device or give an overview of hardware characteristics, that
might make one GPU better suited for the task than others.

The :cpp:func:`hipGetDeviceCount` function returns the number of available GPUs,
which can be used to loop over the available GPUs.

Example code of querying GPUs:

.. code-block:: cpp

#include <hip/hip_runtime.h>
#include <iostream>

int main() {

int deviceCount;
if (hipGetDeviceCount(&deviceCount) == hipSuccess){
for (int i = 0; i < deviceCount; ++i){
hipDeviceProp_t prop;
if ( hipGetDeviceProperties(&prop, i) == hipSuccess)
std::cout << "Device" << i << prop.name << std::endl;
}
}

return 0;
}

Setting the GPU
--------------------------------------------------------------------------------

:cpp:func:`hipSetDevice` function select the GPU to be used for subsequent HIP
operations. This function performs several key tasks:

- Context Binding

Binds the current thread to the context of the specified GPU device. This
ensures that all subsequent operations are executed on the selected device.

- Resource Allocation

Prepares the device for resource allocation, such as memory allocation and
stream creation.

- Check device availability

Checks for errors in device selection and returns error if the specified
device is not available or not capable of executing HIP operations.
2 changes: 2 additions & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,9 @@ The HIP documentation is organized into the following categories:
:::{grid-item-card} How to

* {doc}`./how-to/hip_runtime_api`
* {doc}`./how-to/hip_runtime_api/initialization`
* {doc}`./how-to/hip_runtime_api/memory_management`
* {doc}`./how-to/hip_runtime_api/error_handling`
* {doc}`./how-to/hip_runtime_api/cooperative_groups`
* {doc}`./how-to/hip_runtime_api/hipgraph`
* [HIP porting guide](./how-to/hip_porting_guide)
Expand Down
2 changes: 2 additions & 0 deletions docs/sphinx/_toc.yml.in
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ subtrees:
- file: how-to/hip_runtime_api
subtrees:
- entries:
- file: how-to/hip_runtime_api/initialization
- file: how-to/hip_runtime_api/memory_management
subtrees:
- entries:
Expand All @@ -46,6 +47,7 @@ subtrees:
- file: how-to/hip_runtime_api/memory_management/unified_memory
- file: how-to/hip_runtime_api/memory_management/virtual_memory
- file: how-to/hip_runtime_api/memory_management/stream_ordered_allocator
- file: how-to/hip_runtime_api/error_handling
- file: how-to/hip_runtime_api/cooperative_groups
- file: how-to/hip_runtime_api/hipgraph
- file: how-to/hip_porting_guide
Expand Down