Skip to content

Conversation

@Leonard226
Copy link
Contributor

Abstract

This pull request updates the base_image fields in multiple _cm.yaml configuration files within the MLPerf inference pipeline to ensure compatibility with the aarch64 Grace Hopper architecture. The existing configurations are hardcoded to use x86_64-compatible MLPerf containers, which are not optimized for aarch64 systems.

Technical Details

  • Scope of Changes:

    • Modified all _cm.yaml files where the base_image is set to an x86_64 MLPerf inference container.
    • Targeted directories include app-mlperf-inference and other relevant script directories within the repository.
  • Automation via Patch:

    • Provided a .patch file to automate the replacement process.
    • The patch identifies and replaces base_image entries pointing to MLPerf-x86_64 images with the aarch64 Grace Hopper-compatible image.
    • Ensures that only necessary files and variables are modified, preserving other configurations and settings.
  • Updated base_image Value:

    • Replaced with the following aarch64-compatible container image:
nvcr.io/nvidia/mlperf/mlperf-inference:mlpinf-v4.1-cuda12.4-pytorch24.04-ubuntu22.04-aarch64-GraceHopper-release

Validation

The patch has been tested using the following command, which successfully pulled the Grace Hopper-compatible container after applying the patch:

cm run script --tags=run-mlperf,inference,_find-performance,_r4.1-dev,_short,_scc24-base \
   --model=sdxl \
   --implementation=nvidia \
   --framework=tensorrt \
   --category=datacenter \
   --scenario=Offline \
   --execution_mode=test \
   --device=cuda  \
   --docker --quiet

This patch enables smooth MLPerf inference on aarch64 systems, eliminating the need for users to manually update variables across scripts. It offers a streamlined, automated and tested solution for Grace Hopper compatibility.


Additionally, we corrected a minor bug in get-tensorrt/customize.py to improve regex matching, allowing it to recognize version tags with multiple digits.

@Leonard226 Leonard226 requested a review from a team as a code owner November 4, 2024 20:23
@github-actions
Copy link

github-actions bot commented Nov 4, 2024

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

@arjunsuresh
Copy link
Contributor

Hi @Leonard226 we have added aarch64 docker image support here.

@arjunsuresh
Copy link
Contributor

Hi @Leonard226 Is it okay if I delete the patch file on your branch? You'll be getting the credit in SCC24 for patch though.

@Leonard226
Copy link
Contributor Author

Hi @Leonard226 Is it okay if I delete the patch file on your branch? You'll be getting the credit in SCC24 for patch though.

Yes of course, thanks!

@arjunsuresh arjunsuresh merged commit 6521889 into mlcommons:main Nov 19, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Nov 19, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants