Change default gpu metric backend #2501

FindHao · 2024-10-10T18:51:04Z

The current GPU memory metric backend includes dcgm and nvml. They are reported from hardware and should be accurate. This PR adds the native torch way to collect GPU memory usage. It uses torch.cuda.max_memory_allocated(). The benefit is that it has lower overhead and is accurate on a shared GPU server when there are mutliple GPU processes from other users. It is because we don't implement the process filter for the other two backends.

Use --metrics-gpu-backend torch to set the backend.

FindHao · 2024-10-10T18:52:12Z

Sorry, mixed with some other commits. will update it later.

FindHao · 2024-10-10T20:05:01Z

Sorry, mixed with some other commits. will update it later.

resolved

xuzhao9 · 2024-10-10T20:24:01Z

How about we make torch backend as default to make it consistent with PT2 benchmark runner?

FindHao · 2024-10-10T20:33:14Z

How about we make torch backend as default to make it consistent with PT2 benchmark runner?

done in 4398557

xuzhao9 · 2024-10-11T15:31:46Z

userbenchmark/triton/run.py

+    parser.add_argument(
+        "--metrics-gpu-backend",
+        choices=["default", "nvml"],
+        default="default",


Let's change the default mode name to torch for readability.

renamed in 8b48eea

xuzhao9 · 2024-10-11T15:32:32Z

run.py

@@ -477,18 +477,17 @@ def main() -> None:
    )
    parser.add_argument(
        "--metrics-gpu-backend",
-        choices=["dcgm", "default"],
+        choices=["dcgm", "default", "nvml"],


Same here, let's change the default backend name to torch

renamed in 8b48eea

facebook-github-bot · 2024-10-11T18:00:32Z

@FindHao has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-10-16T16:18:52Z

@FindHao has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-10-16T18:02:03Z

@FindHao has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-10-16T20:07:05Z

@FindHao merged this pull request in c396191.

facebook-github-bot added the cla signed label Oct 10, 2024

FindHao temporarily deployed to docker-s3-upload October 10, 2024 18:51 — with GitHub Actions Inactive

FindHao had a problem deploying to docker-s3-upload October 10, 2024 18:51 — with GitHub Actions Error

FindHao had a problem deploying to docker-s3-upload October 10, 2024 18:51 — with GitHub Actions Failure

FindHao requested a review from xuzhao9 October 10, 2024 18:51

FindHao added 5 commits October 10, 2024 13:02

add torch backend for memory metric

be4e923

update metric backend doc

9a180db

update doc; add it to tritonbench

9015ee2

fix format

ecf1139

keep name same with torchbench

2dd6ca9

FindHao force-pushed the findhao/operatorbench7 branch from 4a9920d to 2dd6ca9 Compare October 10, 2024 20:02

FindHao had a problem deploying to docker-s3-upload October 10, 2024 20:02 — with GitHub Actions Error

FindHao had a problem deploying to docker-s3-upload October 10, 2024 20:03 — with GitHub Actions Failure

FindHao had a problem deploying to docker-s3-upload October 10, 2024 20:03 — with GitHub Actions Error

fix lint

4820d6a

FindHao temporarily deployed to docker-s3-upload October 10, 2024 20:05 — with GitHub Actions Inactive

FindHao had a problem deploying to docker-s3-upload October 10, 2024 20:05 — with GitHub Actions Error

FindHao had a problem deploying to docker-s3-upload October 10, 2024 20:05 — with GitHub Actions Failure

change default gpu metric backend

4398557

FindHao temporarily deployed to docker-s3-upload October 10, 2024 20:33 — with GitHub Actions Inactive

FindHao had a problem deploying to docker-s3-upload October 10, 2024 20:33 — with GitHub Actions Failure

FindHao temporarily deployed to docker-s3-upload October 10, 2024 20:33 — with GitHub Actions Inactive

FindHao mentioned this pull request Oct 10, 2024

OperatorBench Plan pytorch/pytorch#136168

Open

6 tasks

FindHao changed the title ~~Add new metric backend torch~~ Change default gpu metric backend Oct 10, 2024

xuzhao9 reviewed Oct 11, 2024

View reviewed changes

rename default to torch

8b48eea

FindHao temporarily deployed to docker-s3-upload October 11, 2024 17:58 — with GitHub Actions Inactive

FindHao had a problem deploying to docker-s3-upload October 11, 2024 17:58 — with GitHub Actions Failure

Merge branch 'main' into findhao/operatorbench7

3fbd962

FindHao temporarily deployed to docker-s3-upload October 16, 2024 16:18 — with GitHub Actions Inactive

FindHao had a problem deploying to docker-s3-upload October 16, 2024 16:18 — with GitHub Actions Failure

Merge branch 'main' into findhao/operatorbench7

87ed667

FindHao temporarily deployed to docker-s3-upload October 16, 2024 17:57 — with GitHub Actions Inactive

FindHao had a problem deploying to docker-s3-upload October 16, 2024 17:57 — with GitHub Actions Failure

facebook-github-bot closed this in c396191 Oct 16, 2024

facebook-github-bot added the Merged label Oct 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change default gpu metric backend #2501

Change default gpu metric backend #2501

FindHao commented Oct 10, 2024 •

edited

Loading

FindHao commented Oct 10, 2024 •

edited

Loading

FindHao commented Oct 10, 2024

xuzhao9 commented Oct 10, 2024

FindHao commented Oct 10, 2024

xuzhao9 Oct 11, 2024

FindHao Oct 11, 2024

xuzhao9 Oct 11, 2024

FindHao Oct 11, 2024

facebook-github-bot commented Oct 11, 2024

facebook-github-bot commented Oct 16, 2024

facebook-github-bot commented Oct 16, 2024

facebook-github-bot commented Oct 16, 2024

Change default gpu metric backend #2501

Change default gpu metric backend #2501

Conversation

FindHao commented Oct 10, 2024 • edited Loading

FindHao commented Oct 10, 2024 • edited Loading

FindHao commented Oct 10, 2024

xuzhao9 commented Oct 10, 2024

FindHao commented Oct 10, 2024

xuzhao9 Oct 11, 2024

Choose a reason for hiding this comment

FindHao Oct 11, 2024

Choose a reason for hiding this comment

xuzhao9 Oct 11, 2024

Choose a reason for hiding this comment

FindHao Oct 11, 2024

Choose a reason for hiding this comment

facebook-github-bot commented Oct 11, 2024

facebook-github-bot commented Oct 16, 2024

facebook-github-bot commented Oct 16, 2024

facebook-github-bot commented Oct 16, 2024

FindHao commented Oct 10, 2024 •

edited

Loading

FindHao commented Oct 10, 2024 •

edited

Loading