Skip to content

Commit

Permalink
Update multi-gpu discussion for device_buffer and device_vector dtors (
Browse files Browse the repository at this point in the history
…#1524)

Since #1370, the dtor for device_buffer ensures that the correct device is active when the deallocation occurs. We therefore update the example to discuss this. Since device_vector still requires the user to manage the active device correctly by hand, call this out explicitly in the documentation.

- Closes #1523

Authors:
  - Lawrence Mitchell (https://github.com/wence-)

Approvers:
  - Mark Harris (https://github.com/harrism)

URL: #1524
  • Loading branch information
wence- authored Apr 11, 2024
1 parent af756c6 commit 7d7d65a
Showing 1 changed file with 51 additions and 8 deletions.
59 changes: 51 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -336,25 +336,68 @@ for(int i = 0; i < N; ++i) {

Note that the CUDA device that is current when creating a `device_memory_resource` must also be
current any time that `device_memory_resource` is used to deallocate memory, including in a
destructor. This affects RAII classes like `rmm::device_buffer` and `rmm::device_uvector`. Here's an
(incorrect) example that assumes the above example loop has been run to create a
`pool_memory_resource` for each device. A correct example adds a call to `cudaSetDevice(0)` on the
line of the error comment.
destructor. The RAII class `rmm::device_buffer` and classes that use it as a backing store
(`rmm::device_scalar` and `rmm::device_uvector`) handle this by storing the active device when the
constructor is called, and then ensuring that the stored device is active whenever an allocation or
deallocation is performed (including in the destructor). The user must therefore only ensure that
the device active during _creation_ of an `rmm::device_buffer` matches the active device of the
memory resource being used.

Here is an incorrect example that creates a memory resource on device zero and then uses it to
allocate a `device_buffer` on device one:

```c++
{
RMM_CUDA_TRY(cudaSetDevice(0));
rmm::device_buffer buf_a(16);

auto mr = rmm::mr::cuda_memory_resource{};
{
RMM_CUDA_TRY(cudaSetDevice(1));
rmm::device_buffer buf_b(16);
// Invalid, current device is 1, but MR is only valid for device 0
rmm::device_buffer buf(16, rmm::cuda_stream_default, &mr);
}
}
```

A correct example creates the device buffer with device zero active. After that it is safe to switch
devices and let the buffer go out of scope and destruct with a different device active. For example,
this code is correct:

```c++
{
RMM_CUDA_TRY(cudaSetDevice(0));
auto mr = rmm::mr::cuda_memory_resource{};
rmm::device_buffer buf(16, rmm::cuda_stream_default, &mr);
RMM_CUDA_TRY(cudaSetDevice(1));
...
// No need to switch back to device 0 before ~buf runs
}
```

#### Use of `rmm::device_vector` with multiple devices

> [!CAUTION] In contrast to the uninitialized `rmm:device_uvector`, `rmm::device_vector` **DOES
> NOT** store the active device during construction, and therefore cannot arrange for it to be
> active when the destructor runs. It is therefore the responsibility of the user to ensure the
> currently active device is correct.
`rmm::device_vector` is therefore slightly less ergonomic to use in a multiple device setting since
the caller must arrange that active devices on allocation and deallocation match. Recapitulating the
previous example using `rmm::device_vector`:

// Error: when buf_a is destroyed, the current device must be 0, but it is 1
```c++
{
RMM_CUDA_TRY(cudaSetDevice(0));
auto mr = rmm::mr::cuda_memory_resource{};
rmm::device_vector<int> vec(16, rmm::mr::thrust_allocator<int>(rmm::cuda_stream_default, &mr));
RMM_CUDA_TRY(cudaSetDevice(1));
...
// ERROR: ~vec runs with device 1 active, but needs device 0 to be active
}
```

A correct example adds a call to `cudaSetDevice(0)` on the line of the error comment before the dtor
for `~vec` runs.

## `cuda_stream_view` and `cuda_stream`

`rmm::cuda_stream_view` is a simple non-owning wrapper around a CUDA `cudaStream_t`. This wrapper's
Expand Down

0 comments on commit 7d7d65a

Please sign in to comment.