Add aarch64-Compatible Base Image for MLPerf Inference & Fix TensorRT Version Matching #478

Leonard226 · 2024-11-04T20:23:34Z

Abstract

This pull request updates the base_image fields in multiple _cm.yaml configuration files within the MLPerf inference pipeline to ensure compatibility with the aarch64 Grace Hopper architecture. The existing configurations are hardcoded to use x86_64-compatible MLPerf containers, which are not optimized for aarch64 systems.

Technical Details

Scope of Changes:
- Modified all _cm.yaml files where the base_image is set to an x86_64 MLPerf inference container.
- Targeted directories include app-mlperf-inference and other relevant script directories within the repository.
Automation via Patch:
- Provided a .patch file to automate the replacement process.
- The patch identifies and replaces base_image entries pointing to MLPerf-x86_64 images with the aarch64 Grace Hopper-compatible image.
- Ensures that only necessary files and variables are modified, preserving other configurations and settings.
Updated base_image Value:
- Replaced with the following aarch64-compatible container image:

nvcr.io/nvidia/mlperf/mlperf-inference:mlpinf-v4.1-cuda12.4-pytorch24.04-ubuntu22.04-aarch64-GraceHopper-release

Validation

The patch has been tested using the following command, which successfully pulled the Grace Hopper-compatible container after applying the patch:

cm run script --tags=run-mlperf,inference,_find-performance,_r4.1-dev,_short,_scc24-base \
   --model=sdxl \
   --implementation=nvidia \
   --framework=tensorrt \
   --category=datacenter \
   --scenario=Offline \
   --execution_mode=test \
   --device=cuda  \
   --docker --quiet

This patch enables smooth MLPerf inference on aarch64 systems, eliminating the need for users to manually update variables across scripts. It offers a streamlined, automated and tested solution for Grace Hopper compatibility.

Additionally, we corrected a minor bug in get-tensorrt/customize.py to improve regex matching, allowing it to recognize version tags with multiple digits.

github-actions · 2024-11-04T20:23:48Z

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

arjunsuresh · 2024-11-10T22:22:26Z

Hi @Leonard226 we have added aarch64 docker image support here.

arjunsuresh · 2024-11-16T18:40:43Z

Hi @Leonard226 Is it okay if I delete the patch file on your branch? You'll be getting the credit in SCC24 for patch though.

Leonard226 · 2024-11-17T01:57:49Z

Hi @Leonard226 Is it okay if I delete the patch file on your branch? You'll be getting the credit in SCC24 for patch though.

Yes of course, thanks!

SkyWorld117 and others added 2 commits November 3, 2024 15:48

Fix TensorRT folder version detection for more than 1 digit

97b784e

Add patch file for container image replacement in patches folder

1bcae6e

Leonard226 requested a review from a team as a code owner November 4, 2024 20:23

Merge branch 'main' into feature-branch

c425f91

Delete patches directory

4ff23a2

arjunsuresh approved these changes Nov 19, 2024

View reviewed changes

arjunsuresh merged commit 6521889 into mlcommons:main Nov 19, 2024

github-actions bot locked and limited conversation to collaborators Nov 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add aarch64-Compatible Base Image for MLPerf Inference & Fix TensorRT Version Matching #478

Add aarch64-Compatible Base Image for MLPerf Inference & Fix TensorRT Version Matching #478

Uh oh!

Leonard226 commented Nov 4, 2024

Uh oh!

github-actions bot commented Nov 4, 2024 •

edited

Loading

Uh oh!

arjunsuresh commented Nov 10, 2024

Uh oh!

arjunsuresh commented Nov 16, 2024

Uh oh!

Leonard226 commented Nov 17, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add aarch64-Compatible Base Image for MLPerf Inference & Fix TensorRT Version Matching #478

Add aarch64-Compatible Base Image for MLPerf Inference & Fix TensorRT Version Matching #478

Uh oh!

Conversation

Leonard226 commented Nov 4, 2024

Abstract

Technical Details

Validation

Uh oh!

github-actions bot commented Nov 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arjunsuresh commented Nov 10, 2024

Uh oh!

arjunsuresh commented Nov 16, 2024

Uh oh!

Leonard226 commented Nov 17, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions bot commented Nov 4, 2024 •

edited

Loading