-
Notifications
You must be signed in to change notification settings - Fork 151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Enhancement] Skip error gpus and show normal infos automatically #45
Labels
Comments
@jue-jue-zi Thanks for the feedback! I'll add a quick fix soon. |
@jue-jue-zi I pushed a new commit to handle this. You can reinstall pip3 install git+https://github.com/XuehaiPan/nvitop.git#egg=nvitop |
Thanks for fixing it so soon, but it seems that there still exist some problems,
|
Fixed by the newest commit. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Runtime Environment
nvitop
version or commit: 0.10.0nvidia-ml-py
version: 11.515.75Current Behavior
There are four GPUs on our server. And one of those was overheated for some reasons, which make that GPU cannot be recognized. If run
nvidia-smi
command without any args to query all the GPUs, errorUnable to determine the device handle for GPU 0000:0C:00.0: Unknown Error
will show without showing the remaining normal GPUs' infos. But if the command assigns the normal GPUs (nvidia-smi -i 0,1,3
), all infos of the normal GPUs can be shown directly.And if I use
nvitop
command to show the GPUs' infos,nvidia-ml-py
will throw exceptions like this below,Expected Behavior
I hope that with
nvitop
command, all the GPUs with errors can be skipped automatically, and show the normal GPUs' infos. If possible, maybe the error GPUs' info can be shown as tips below the normal infos using red fonts for emphasizing.The text was updated successfully, but these errors were encountered: