Skip to content

[FEA] Make device_vector safer to use in multi-device setting #1527

Closed
@wence-

Description

Is your feature request related to a problem? Please describe.

Since #1370, device_buffer is safe to use in a multi-device setting wrt active devices when the destructor runs. While it was always possible (and relatively straightforward) to arrange for the active device to be correct in scenarios where no exceptions occurred, when there are exceptions setting the correct device for destruction was much more complicated.

We therefore added the cuda_set_device_raii helper object and stored the active device id in the device_buffer to ensure that the correct device is always active when calling allocate/deallocate functions.

In contrast, since device_vector is just an alias for thrust::device_vector, it still suffers from the old issue: the user must manually arrange that the correct device is active for the dtor.

Describe the solution you'd like

#1523 documents this restriction, but it would be good if we could lift it. One way would be to store the active device in the thrust allocator wrapper that we use to interface RMM's memory resources with the thrust allocator model.

We would then use cuda_set_device_raii in all the allocate/deallocate functions.

This was discounted as an approach in #1370 since it produces more device switches than necessary in some circumstances (pushing the device switching as far out as possible was preferred), so there would be some overhead compared to use of device_buffer (though hopefully small). And we note that since device_vector isn't stream ordered there are other disadvantages to using it, so the small performance cost is probably not that terminal.

Describe alternatives you've considered

Maintain status quo, and eventually deprecate and then remove device_vector, since it is not stream-ordered anyway and we are trying to move away from that model.

Metadata

Assignees

Labels

1 - On DeckTo be worked on nextcppPertains to C++ codefeature requestNew feature or request

Type

No type

Projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions