Add AMD Support by bethune-bryant · Pull Request #173 · wookayin/gpustat

bethune-bryant · 2024-07-29T14:17:50Z

Fixes #137

Design

To do this I duplicate the pynvml interface already used by gpustat in a wrapper around rocmi and dynamically import the correct library based on what hardware is present.

Current Status

The base functionality is currently working:

Remaining Tasks

Basic Functionality
Testing
Documentation

bethune-bryant · 2024-08-08T14:39:06Z

@wookayin
Before I start working on documentation and testing, would you mind taking a look at this PR?
Do you agree with the overall design, or is there something you would like changed?
Do you have any concerns?

Stonesjtu

Can you add some mocking tests for ROCM devices?

bethune-bryant · 2024-09-03T20:23:37Z

Can you add some mocking tests for ROCM devices?

I'm not super familiar with mockito, but I've started looking into this.

Stonesjtu

LGTM.

for the testing part, we can mock a ROCML based NVML library call like NVMLGetFanSpeed to return constant values.

Stonesjtu · 2024-10-14T12:15:34Z

                gpu_stat = InvalidGPU(index, "((Unknown Error))", e)
            except N.NVMLError_GpuIsLost as e:
                gpu_stat = InvalidGPU(index, "((GPU is lost))", e)
+            except Exception as e:


Should we raise the N.NVMLError_Unknown Error for consistency?

ps: we can catch NVMLError instead of Base Exception, since you may ignore some python native errors

Stonesjtu · 2024-10-14T12:17:02Z

+        super().__init__(self.message)
+
+
+class NVMLError_Unknown(Exception):


Should these NVMLError_xxx inherit NVMLError?

Stonesjtu · 2024-10-14T12:18:01Z

+except (ImportError, SyntaxError, RuntimeError) as e:
+    _rocmi = sys.modules.get("rocmi", None)
+
+    raise ImportError(


Should we make this a dedicated NVMLError subclass?

metalcycling · 2025-06-26T19:28:08Z

Will this be merged at some point?

remi-or · 2025-09-25T14:12:17Z

Hey everyone! I would be very happy to use the same package for amd and nvidia, what are the blockers that need to be addressed for this to get merged? Happy to make contributions.
cc. @wookayin as it looks like you are the core contributor 🙂

wookayin · 2025-09-25T14:19:24Z

Hi, I am very sorry that I've been inactive in this PR as I don't have any machines with an AMD graphics card (neither local or remote) where I can give a test. But if some of you can help out testing out the feature, I'd be so grateful and happy to have this merged sooner than later. I might also need to try setting up AWS G4ad instances soon.

remi-or · 2025-09-25T14:44:12Z

I have access to a node with AMD GPUs, so I would be happy to test it out. Though I see no testing script was added in this PR, so do you have any test in mind? Happy if this gets merged soon as well 😄

bethune-bryant added 17 commits July 29, 2024 14:14

Begin adding AMD support.

65ba474

Add pyrsmi depedency.

261faf7

Add simple hardware switch functionalty.

ca650ba

Move default exception to end

5b229f8

Typo

3c1a744

Default to nvidia.

8ba8134

Typo...

9f07c49

Hide output from rocml.

85d0dbf

add frequency.

2c9aadf

Switching to amdsmi

cc2d0f0

Fix index lookup.

c2ea30e

Remove frequency stuff for now.

3e0c2b1

Check for amdsmi.

173d144

Get driver version

800bd0d

Format new file.

bf1a00a

Typo.

6b731eb

Switch to rocmi.

1a09222

bethune-bryant changed the title ~~WIP - AMD Support~~ Add AMD Support Aug 8, 2024

bethune-bryant marked this pull request as ready for review August 8, 2024 14:37

wookayin self-assigned this Aug 8, 2024

wookayin added the new feature label Aug 8, 2024

bethune-bryant added 3 commits August 8, 2024 14:56

Cleanup unneeded code.

dfce699

Add driver version.

f1abc19

Fix power divisor.

9a2e2af

Stonesjtu reviewed Aug 19, 2024

View reviewed changes

Comment thread setup.py

Comment thread gpustat/util.py

bethune-bryant requested a review from Stonesjtu October 8, 2024 20:02

Stonesjtu approved these changes Oct 14, 2024

View reviewed changes

		super().__init__(self.message)


		class NVMLError_Unknown(Exception):

Conversation

bethune-bryant commented Jul 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Design

Current Status

Remaining Tasks

Uh oh!

bethune-bryant commented Aug 8, 2024

Uh oh!

Stonesjtu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

bethune-bryant commented Sep 3, 2024

Uh oh!

Stonesjtu left a comment

Choose a reason for hiding this comment

Uh oh!

Stonesjtu Oct 14, 2024

Choose a reason for hiding this comment

Uh oh!

Stonesjtu Oct 14, 2024

Choose a reason for hiding this comment

Uh oh!

Stonesjtu Oct 14, 2024

Choose a reason for hiding this comment

Uh oh!

Stonesjtu Oct 14, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

metalcycling commented Jun 26, 2025

Uh oh!

remi-or commented Sep 25, 2025

Uh oh!

wookayin commented Sep 25, 2025

Uh oh!

remi-or commented Sep 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

bethune-bryant commented Jul 29, 2024 •

edited

Loading