Skip to content

Commit

Permalink
Update multi-gpu discussion for device_buffer and device_vector dtors
Browse files Browse the repository at this point in the history
Since #1370, the dtor for device_buffer ensures that the correct
device is active when the deallocation occurs. We therefore update the
example to discuss this. Since device_vector still requires the user
to manage the active device correctly by hand, call this out
explicitly in the documentation.

- Closes #1523
  • Loading branch information
wence- committed Apr 10, 2024
1 parent 6771b71 commit bebabbd
Showing 1 changed file with 55 additions and 8 deletions.
63 changes: 55 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -336,24 +336,71 @@ for(int i = 0; i < N; ++i) {

Note that the CUDA device that is current when creating a `device_memory_resource` must also be
current any time that `device_memory_resource` is used to deallocate memory, including in a
destructor. This affects RAII classes like `rmm::device_buffer` and `rmm::device_uvector`. Here's an
(incorrect) example that assumes the above example loop has been run to create a
`pool_memory_resource` for each device. A correct example adds a call to `cudaSetDevice(0)` on the
line of the error comment.
destructor. The RAII class `rmm::device_buffer` and classes that use
it as a backing store (`rmm::device_scalar` and `rmm::device_uvector`)
handle this by storing the active device when the constructor is
called, and then ensuring that the stored device is active whenever an
allocation or deallocation is performed (including in the destructor).
The user must therefore only ensure that the device active during
_creation_ of an `rmm::device_buffer` matches the active device of the
memory resource being used.

Here is an incorrect example that creates a memory resource on device
zero and then uses it to allocate a `device_buffer` on device one:

```c++
{
RMM_CUDA_TRY(cudaSetDevice(0));
rmm::device_buffer buf_a(16);

auto mr = rmm::mr::cuda_memory_resource{};
{
RMM_CUDA_TRY(cudaSetDevice(1));
rmm::device_buffer buf_b(16);
// Invalid, current device is 1, but MR is only valid for device 0
rmm::device_buffer buf(16, rmm::cuda_stream_default, &mr);
}
}
```

// Error: when buf_a is destroyed, the current device must be 0, but it is 1
A correct example creates the device buffer with device zero active.
After that it is safe to switch devices and let the buffer go out of
scope and destruct with a different device active. For example, this
code is correct:
```c++
{
RMM_CUDA_TRY(cudaSetDevice(0));
auto mr = rmm::mr::cuda_memory_resource{};
rmm::device_buffer buf(16, rmm::cuda_stream_default, &mr);
RMM_CUDA_TRY(cudaSetDevice(1));
...
// No need to switch back to device 0 before ~buf runs
}
```

#### Use of `rmm::device_vector` with multiple devices

> ![CAUTION]
> In contrast to the uninitialized `rmm:device_uvector`,
> `rmm::device_vector` **DOES NOT** store the active device during
> construction, and therefore cannot arrange for it to be active when
> the destructor runs. It is therefore the responsibility of the user
> to ensure the currently active device is correct.
`rmm::device_vector` is therefore slightly less ergonomic to use in a
multiple device setting since the caller must arrange that active
devices on allocation and deallocation match. Recapitulating the
previous example using `rmm::device_vector`:

```c++
{
RMM_CUDA_TRY(cudaSetDevice(0));
auto mr = rmm::mr::cuda_memory_resource{};
rmm::device_vector<int> vec(16, rmm::mr::thrust_allocator<int>(rmm::cuda_stream_default, &mr));
RMM_CUDA_TRY(cudaSetDevice(1));
...
// ERROR: ~vec runs with device 1 active, but needs device 0 to be active
}
```
A correct example adds a call to `cudaSetDevice(0)` on the line of the
error comment before the dtor for `~vec` runs.

## `cuda_stream_view` and `cuda_stream`

Expand Down

0 comments on commit bebabbd

Please sign in to comment.