Skip to content

Monitor nvidia-smi output to see GPU resource consumption #72

Open
@samhodge-aiml

Description

Is your feature request related to a problem? Please describe.
I need to see how much VRAM and GPU compute are being used by a process in a container, and have a historical record in a sql table to continue to narrow the gap between resources allocated and resources consumed

Describe the solution you'd like
I would like to be able to wrap the output of nvidia-smi and have it come out in the same dictionary or a side car type concept for the rest of the watchme metrics

Describe alternatives you've considered
Use the following https://github.com/petronny/nvsmi and dump that into a dictionary at the same time as the watchme decorator

Additional context
Getting computation to match the resources allocated closely is a problem with commercial value, anyone who makes use of GPUs should be interested in how much these resources are occupied because buying and renting them is not cheap

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions