[FEA] Track mrs for to warn during cleanup by reinitialize #1314
Description
Is your feature request related to a problem? Please describe.
rmm.reinitialize
will clean up any internal references to memory resources prior to recreating new instances. However, it currently has no way to track user-created resources. This means that users who manually create memory resource objects must also delete all references to them in their own code prior to calling reinitialize. This behavior is neither documented nor easily evident, and usually manifests as unexpected heightened total memory consumption.
Describe the solution you'd like
We should add a simple metaclass for DeviceMemoryResource
that keeps track of all instances that have been created so far. Then, rmm.reinitialize
can check this list of references and warn the user if any references remain that rmm cannot handle cleaning up itself. That will provide users more immediate feedback that something is wrong her.
Describe alternatives you've considered
Rather than warning, we could raise an exception so that users have to fix the issue immediately. It's likely that many users won't notice a warning. OTOH for some use cases it may be acceptable for the old mrs to persist, and erroring is more intrusive.
Alternatively, we could outfit DeviceMemoryResource
with the ability to be invalidated in some way so that all outstanding mrs during reinitialize are marked as unusable. Then any future function calls would trigger errors. This approach would require significantly more technical investment, and it's not clear that there's a huge benefit. It would guarantee all memory being returned on reinitialize at the expensive of a more confusing user experience (they would encounter an error potentially long after the reinitialize when they tried to use old mrs).
Metadata
Assignees
Type
Projects
Status
To-do