Update to cuda12.1, nccl 2.17.1, hpcx 2.14, and mlc 3.10 #513

abuccts · 2023-04-12T03:10:11Z

Update cuda11.8 image to cuda12.1 based on nvcr23.03 and related versions in the image:

cuda 11.8 -> 12.1
nccl 2.15.5 -> 2.17.1
hpcx: 2.8 -> 2.14
mlc: 3.9a -> 3.10

Update cuda11.8 image to cuda12.1 based on nvcr23.03 and related versions in the image: * cuda 11.8 -> 12.1 * nccl 2.15.5 -> 2.17.1 * ofed: 5.2 -> 5.8 * hpcx: 2.8 -> 2.14 * mlc: 3.9a -> 3.10

codecov · 2023-04-12T04:05:28Z

Codecov Report

Merging #513 (c491de6) into release/0.8 (5a2addd) will not change coverage.
The diff coverage is n/a.

@@             Coverage Diff              @@
##           release/0.8     #513   +/-   ##
============================================
  Coverage        87.24%   87.24%           
============================================
  Files               89       89           
  Lines             5964     5964           
============================================
  Hits              5203     5203           
  Misses             761      761

Flag	Coverage Δ
cpu-python3.6-unit-test	`73.47% <ø> (ø)`
cpu-python3.7-unit-test	`73.47% <ø> (ø)`
cpu-python3.8-unit-test	`73.95% <ø> (ø)`
cuda-unit-test	`87.17% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

Revert ofed version.

Update.

…ft/superbenchmark into xiongyf/upgrade-versions

dockerfile/cuda12.1.dockerfile

Update cuda11.8 image to cuda12.1 based on nvcr23.03 and related versions in the image: * cuda 11.8 -> 12.1 * nccl 2.15.5 -> 2.17.1 * hpcx: 2.8 -> 2.14 * mlc: 3.9a -> 3.10

**Description** Cherry-pick bug fixes from v0.8.0 to main. **Major Revisions** * Monitor - Fix the cgroup version checking logic (#502) * Benchmark - Fix matrix size overflow issue in cuBLASLt GEMM (#503) * Fix wrong torch usage in communication wrapper for Distributed Inference Benchmark (#505) * Analyzer: Fix bug in python3.8 due to pandas api change (#504) * Bug - Fix bug to get metric from cmd when error happens (#506) * Monitor - Collect realtime GPU power when benchmarking (#507) * Add num_workers argument in model benchmark (#511) * Remove unreachable condition when write host list (#512) * Update cuda11.8 image to cuda12.1 based on nvcr23.03 (#513) * Doc - Fix wrong unit of cpu-memory-bw-latency in doc (#515) * Docs - Upgrade version and release note (#508) Co-authored-by: guoshzhao <guzhao@microsoft.com> Co-authored-by: Ziyue Yang <ziyyang@microsoft.com> Co-authored-by: Yuting Jiang <yutingjiang@microsoft.com>

Update cuda11.8 image to cuda12.1 based on nvcr23.03

3445584

Update cuda11.8 image to cuda12.1 based on nvcr23.03 and related versions in the image: * cuda 11.8 -> 12.1 * nccl 2.15.5 -> 2.17.1 * ofed: 5.2 -> 5.8 * hpcx: 2.8 -> 2.14 * mlc: 3.9a -> 3.10

abuccts added the containers SuperBench Containers label Apr 12, 2023

abuccts requested a review from a team as a code owner April 12, 2023 03:10

cp5555 self-requested a review April 12, 2023 03:52

cp5555 approved these changes Apr 12, 2023

View reviewed changes

abuccts added 4 commits April 12, 2023 12:44

Revert ofed version

b648bcf

Revert ofed version.

Merge branch 'release/0.8' into xiongyf/upgrade-versions

7fdfe94

Update

fff27f8

Update.

Merge branch 'xiongyf/upgrade-versions' of https://github.com/microso…

c491de6

…ft/superbenchmark into xiongyf/upgrade-versions

abuccts enabled auto-merge (squash) April 12, 2023 07:49

guoshzhao reviewed Apr 12, 2023

View reviewed changes

dockerfile/cuda12.1.dockerfile Show resolved Hide resolved

guoshzhao approved these changes Apr 12, 2023

View reviewed changes

abuccts merged commit 17c01d8 into release/0.8 Apr 12, 2023

abuccts deleted the xiongyf/upgrade-versions branch April 12, 2023 08:01

cp5555 changed the title ~~Update cuda11.8 image to cuda12.1 based on nvcr23.03~~ Update to cuda12.1, nccl 2.17.1, hpcx 2.14, and mlc 3.10 Apr 12, 2023

cp5555 mentioned this pull request Apr 12, 2023

V0.8.0 Release Plan #500

Closed

23 tasks

abuccts mentioned this pull request Apr 14, 2023

Release - SuperBench v0.8.0 #517

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update to cuda12.1, nccl 2.17.1, hpcx 2.14, and mlc 3.10 #513

Update to cuda12.1, nccl 2.17.1, hpcx 2.14, and mlc 3.10 #513

abuccts commented Apr 12, 2023 •

edited

Loading

codecov bot commented Apr 12, 2023 •

edited

Loading

Update to cuda12.1, nccl 2.17.1, hpcx 2.14, and mlc 3.10 #513

Update to cuda12.1, nccl 2.17.1, hpcx 2.14, and mlc 3.10 #513

Conversation

abuccts commented Apr 12, 2023 • edited Loading

codecov bot commented Apr 12, 2023 • edited Loading

Codecov Report

abuccts commented Apr 12, 2023 •

edited

Loading

codecov bot commented Apr 12, 2023 •

edited

Loading