Skip to content

[FEA] Track mrs for to warn during cleanup by reinitialize #1314

Open
@vyasr

Description

Is your feature request related to a problem? Please describe.
rmm.reinitialize will clean up any internal references to memory resources prior to recreating new instances. However, it currently has no way to track user-created resources. This means that users who manually create memory resource objects must also delete all references to them in their own code prior to calling reinitialize. This behavior is neither documented nor easily evident, and usually manifests as unexpected heightened total memory consumption.

Describe the solution you'd like
We should add a simple metaclass for DeviceMemoryResource that keeps track of all instances that have been created so far. Then, rmm.reinitialize can check this list of references and warn the user if any references remain that rmm cannot handle cleaning up itself. That will provide users more immediate feedback that something is wrong her.

Describe alternatives you've considered
Rather than warning, we could raise an exception so that users have to fix the issue immediately. It's likely that many users won't notice a warning. OTOH for some use cases it may be acceptable for the old mrs to persist, and erroring is more intrusive.

Alternatively, we could outfit DeviceMemoryResource with the ability to be invalidated in some way so that all outstanding mrs during reinitialize are marked as unusable. Then any future function calls would trigger errors. This approach would require significantly more technical investment, and it's not clear that there's a huge benefit. It would guarantee all memory being returned on reinitialize at the expensive of a more confusing user experience (they would encounter an error potentially long after the reinitialize when they tried to use old mrs).

Metadata

Assignees

Labels

? - Needs TriageNeed team to review and classifyPythonRelated to RMM Python APIfeature requestNew feature or request

Type

No type

Projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions