Store and set the correct CUDA device in device_buffer #1370

harrism · 2023-11-02T03:08:07Z

Description

This changes device_buffer to store the active CUDA device ID on creation, and (possibly temporarily) set the active device to that ID before allocating or freeing memory. It also adds tests for containers built on device_buffer (device_buffer, device_uvector and device_scalar) that ensure correct operation when the device is changed before doing things that alloc/dealloc memory for those containers.

This fixes #1342 . HOWEVER, there is an important question yet to answer:

rmm::device_vector is just an alias for thrust::device_vector, which does not use rmm::device_buffer for storage. However users may be surprised after this PR because the multidevice semantics of RMM containers will be different from thrust::device_vector (and therefore rmm::device_vector).

Update: opinion is that it's probably OK to diverge from device_vector, and some think we should remove rmm::device_vector.

~~While we discuss this I have set the DO NOT MERGE label.~~

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

… to -1

jrhemstad · 2023-11-02T22:12:05Z

Maybe just get rid of rmm::device_vector? We went out of our way to get rid of all uses of that in favor of device_uvector anyways.

include/rmm/cuda_device.hpp

include/rmm/device_buffer.hpp

wence-

A few non-blocking questions.

include/rmm/cuda_device.hpp

include/rmm/device_buffer.hpp

wence- · 2023-11-06T18:00:25Z

include/rmm/device_buffer.hpp

+    cuda_set_device_raii dev{_device};
    allocate_async(size);


question: Should the setting of the current device live inside allocate/deallocate rather than it being the responsibility of the caller to ensure the device is correct? Or, is this deliberate because we might want more than just the allocate call to occur with the same device active and this approach avoids excessive device switching?

Yeah, it's deliberate. I wanted to put it in allocate_async/deallocate_async, but those calls are often made in places where the correct device is also needed for other operations, and we don't want to cuda_set_device_raii multiple times. There are also places such as resize / shrink_to_fit where a new device_buffer is created and we want that to happen with the original device active, but inside it we call allocate_async and that would cause redundant current device checking.

I think what I've arrived at is that in order to minimize device switching, we want to do it at the highest level in device_buffer possible, which means the public API functions (when necessary). For the same reason, we assume the user has set the device before constructing the device_buffer, and we just store the ID at that stage.

wence- · 2023-11-06T18:15:00Z

include/rmm/device_buffer.hpp

+      cuda_set_device_raii dev{_device};
      auto tmp            = device_buffer{new_capacity, stream, _mr};


question: Does the appearance of this pattern suggest that the device_buffer constructor should have an (optional) device argument that one can provide, rather than relying on the implicit current cuda device (which is then managed by this raii object here)?

If we did that, then if we are eliminating the cuda_set_device_raii here, then the constructor would have to first call cudaSetDevice(device), and I assume it would do so using cuda_set_device_raii, which means on exiting the ctor the previous device would be reset (if different).

So then we would need to call cuda_set_device_raii again after calling the constructor with the optional device argument because of the subsequent cudaMemcpyAsync. That could mean two calls to cudaGetDevice and four calls to cudaSetDevice, worst case. The way it is now, there is at most 1 cudaGetDevice and at most 2 cudaSetDevice.

Hmm, my understanding from the docs was that runtime calls (excepting [some] of those to do with events, where the call has to happen with the live device matching the event's stream) don't care about the current device and hence allocation/deallocation (which, with a pool mr record events) are the only places we need to handle it.

harrism · 2023-11-07T05:57:24Z

@jrhemstad @wence- do either of you want to opine on the existential question I asked in the description of this PR?

wence- · 2023-11-07T08:56:11Z

@jrhemstad @wence- do either of you want to opine on the existential question I asked in the description of this PR?

I presume you mean:

rmm::device_vector is just an alias for thrust::device_vector, which does not use rmm::device_buffer for storage. However users may be surprised after this PR because the multidevice semantics of RMM containers will be different from thrust::device_vector (and therefore rmm::device_vector).

I think my preference is to mark as [[deprecated(...)]] for 23.12 and then remove in 24.02. We can add an example about how to use thrust vectors with an RMM memory resource.

Edit: I think Jake is also in favour of removal: #1370 (comment)

harrism · 2023-11-08T05:51:03Z

Maybe just get rid of rmm::device_vector? We went out of our way to get rid of all uses of that in favor of device_uvector anyways.

Yes, I did a lot of that eradication work. But we didn't eliminate all device_vector from rapids, especially in tests. In fact a search shows that cuGraph still uses thrust::device_vector, not just rmm::device_vector.

I actually think an initialized vector is useful, as long as you know about its synchronizing behavior. So I don't really want to remove rmm::device_vector.

But I guess what you are saying is that you think it's OK for rmm::device_uvector and rmm::device_vector to have different semantics. I agree.

wence-

Could you please edit the PR to description to summarise the outcome around device_vector (rather than mentioning it as an issue to resolve).

harrism · 2023-11-08T23:51:28Z

Could you please edit the PR to description to summarise the outcome around device_vector (rather than mentioning it as an issue to resolve).

Done.

include/rmm/device_buffer.hpp

…m into fea-device-buffer-multidevice

include/rmm/cuda_device.hpp

…m into fea-device-buffer-multidevice

include/rmm/cuda_device.hpp

harrism · 2023-11-15T04:32:10Z

/merge

This PR removes static checks for serialization size. Upstream changes like rapidsai/rmm#1370 have altered these sizes and break RAFT CI. An alternative approach to verifying serialization will be developed. Authors: - Corey J. Nolet (https://github.com/cjnolet) - Bradley Dice (https://github.com/bdice) Approvers: - Divye Gala (https://github.com/divyegala) - Mark Harris (https://github.com/harrism) URL: #1997

Since rapidsai#1370, the dtor for device_buffer ensures that the correct device is active when the deallocation occurs. We therefore update the example to discuss this. Since device_vector still requires the user to manage the active device correctly by hand, call this out explicitly in the documentation. - Closes rapidsai#1523

…#1524) Since #1370, the dtor for device_buffer ensures that the correct device is active when the deallocation occurs. We therefore update the example to discuss this. Since device_vector still requires the user to manage the active device correctly by hand, call this out explicitly in the documentation. - Closes #1523 Authors: - Lawrence Mitchell (https://github.com/wence-) Approvers: - Mark Harris (https://github.com/harrism) URL: #1524

harrism added 4 commits October 12, 2023 06:02

Set the device to the owning CUDA device when necessary.

c7bf767

Add device ID comparison functions and protect against setting device…

4633fdf

… to -1

Add missing device reset

c2656d2

Add tests for containers when switching active CUDA device

a4d590e

harrism added feature request New feature or request non-breaking Non-breaking change 5 - DO NOT MERGE Hold off on merging; see PR for details cpp Pertains to C++ code labels Nov 2, 2023

harrism requested a review from jrhemstad November 2, 2023 03:08

harrism self-assigned this Nov 2, 2023

harrism requested review from a team as code owners November 2, 2023 03:08

harrism requested a review from vyasr November 2, 2023 03:08

github-actions bot added the CMake label Nov 2, 2023

jrhemstad reviewed Nov 2, 2023

View reviewed changes

include/rmm/cuda_device.hpp Outdated Show resolved Hide resolved

jrhemstad reviewed Nov 2, 2023

View reviewed changes

include/rmm/device_buffer.hpp Show resolved Hide resolved

wence- reviewed Nov 6, 2023

View reviewed changes

harrism added 2 commits November 7, 2023 04:10

hidden friend operators

3778c1b

Move ctor: copy device ID

c3a5edf

Fix docs

14ff4e3

Merge branch 'branch-23.12' into fea-device-buffer-multidevice

8cadcd6

harrism removed the 5 - DO NOT MERGE Hold off on merging; see PR for details label Nov 8, 2023

Merge branch 'branch-23.12' into fea-device-buffer-multidevice

082283d

wence- approved these changes Nov 8, 2023

View reviewed changes

jrhemstad approved these changes Nov 9, 2023

View reviewed changes

harrism commented Nov 9, 2023

View reviewed changes

include/rmm/device_buffer.hpp Outdated Show resolved Hide resolved

harrism added 4 commits November 9, 2023 13:06

Update include/rmm/device_buffer.hpp

fa2c892

Merge branch 'branch-23.12' into fea-device-buffer-multidevice

2381a79

Merge branch 'fea-device-buffer-multidevice' of github.com:harrism/rm…

6a403e0

…m into fea-device-buffer-multidevice

Hidden friends need not be private

c9cd9fb

harrism commented Nov 9, 2023

View reviewed changes

include/rmm/cuda_device.hpp Outdated Show resolved Hide resolved

harrism commented Nov 9, 2023

View reviewed changes

include/rmm/cuda_device.hpp Outdated Show resolved Hide resolved

harrism added 3 commits November 9, 2023 21:32

Attempt to fix weird double friend error from sphinx...

7168fb0

Don't document hidden friends to try to work around Breathe bug.

34bbfde

Merge branch 'fea-device-buffer-multidevice' of github.com:harrism/rm…

2f56c48

…m into fea-device-buffer-multidevice

harrism commented Nov 15, 2023

View reviewed changes

include/rmm/cuda_device.hpp Outdated Show resolved Hide resolved

Suppress documentation for hidden friends.

d1ef1be

harrism commented Nov 15, 2023

View reviewed changes

include/rmm/cuda_device.hpp Outdated Show resolved Hide resolved

harrism added 2 commits November 15, 2023 11:57

Typo

a70f19d

clang-format

06e080c

rapids-bot bot merged commit ba99ff4 into rapidsai:branch-23.12 Nov 15, 2023
44 checks passed

bdice mentioned this pull request Nov 15, 2023

Remove static checks for serialization size rapidsai/raft#1997

Merged

bdice added breaking Breaking change and removed non-breaking Non-breaking change labels Nov 15, 2023

This was referenced Apr 10, 2024

Fix ordering / heading levels in README.md and python example in guide.md #1513

Merged

[DOC] Correct active device requirements for deallocation of device_buffer, device_vector and friends #1523

Closed

wence- mentioned this pull request Apr 10, 2024

Update multi-gpu discussion for device_buffer and device_vector dtors #1524

Merged

3 tasks

wence- mentioned this pull request Apr 11, 2024

[FEA] Make device_vector safer to use in multi-device setting #1527

Closed

harrism mentioned this pull request Aug 6, 2024

[PoC]: Implement cuda::experimental::uninitialized_async_buffer NVIDIA/cccl#1854

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Store and set the correct CUDA device in device_buffer #1370

Store and set the correct CUDA device in device_buffer #1370

harrism commented Nov 2, 2023 •

edited by wence-

Loading

jrhemstad commented Nov 2, 2023

wence- left a comment

wence- Nov 6, 2023

harrism Nov 7, 2023

harrism Nov 7, 2023

wence- Nov 6, 2023

harrism Nov 7, 2023

wence- Nov 7, 2023

harrism commented Nov 7, 2023

wence- commented Nov 7, 2023 •

edited

Loading

harrism commented Nov 8, 2023 •

edited

Loading

wence- left a comment

harrism commented Nov 8, 2023

harrism commented Nov 15, 2023

		cuda_set_device_raii dev{_device};
		auto tmp = device_buffer{new_capacity, stream, _mr};

Store and set the correct CUDA device in device_buffer #1370

Store and set the correct CUDA device in device_buffer #1370

Conversation

harrism commented Nov 2, 2023 • edited by wence- Loading

Description

Checklist

jrhemstad commented Nov 2, 2023

wence- left a comment

Choose a reason for hiding this comment

wence- Nov 6, 2023

Choose a reason for hiding this comment

harrism Nov 7, 2023

Choose a reason for hiding this comment

harrism Nov 7, 2023

Choose a reason for hiding this comment

wence- Nov 6, 2023

Choose a reason for hiding this comment

harrism Nov 7, 2023

Choose a reason for hiding this comment

wence- Nov 7, 2023

Choose a reason for hiding this comment

harrism commented Nov 7, 2023

wence- commented Nov 7, 2023 • edited Loading

harrism commented Nov 8, 2023 • edited Loading

wence- left a comment

Choose a reason for hiding this comment

harrism commented Nov 8, 2023

harrism commented Nov 15, 2023

harrism commented Nov 2, 2023 •

edited by wence-

Loading

wence- commented Nov 7, 2023 •

edited

Loading

harrism commented Nov 8, 2023 •

edited

Loading