Closed
Description
openedon Mar 28, 2023
Release Manager
Endgame
- Code freeze: March 28th, 2023
- Bug Bash date: March 29th, 2023
- Release date: April 7th, 2023
Main Features
SuperBench Improvement
-
- Support SuperBench Executor running on Windows (Executor - Support SuperBench Executor running on Windows #475)
-
- Remove fixed rccl version in rocm5.1.x docker file (Dockerfile: Remove fixed rccl version in rocm5.1.x docker file #476)
-
- Upgrade networkx version to fix installation compatibility issue (CI/CD - Upgrade networkx version to fix installation compatibility issue #478)
-
- Pin setuptools version to v65.7.0 (Pin setuptools version to v65.7.0 #483)
-
- Limit ansible_runner version for Python3.6 (Limit ansible_runner version for Python3.6 #485)
-
- Support cgroup V2 when read system metrics in Monitor (Monitor - Support cgroup V2 when read system metrics. #491, Monitor - Fix the cgroup version checking logic. #502)
-
- Fix analyzer bug in python3.8 due to pandas api change (Analyzer: Fix bug in python3.8 due to pandas api change #504)
-
- Collect real-time GPU power in Monitor (Monitor - Collect realtime GPU power when benchmarking. #507)
-
- Remove unreachable condition when write host list (Remove unreachable condition when write host list #512)
-
- Update to cuda12.1, nccl 2.17.1, hpcx 2.14, and mlc 3.10 (Update to cuda12.1, nccl 2.17.1, hpcx 2.14, and mlc 3.10 #513)
-
- Fix wrong unit of cpu-memory-bw-latency in doc (Doc - Fix wrong unit of cpu-memory-bw-latency in doc #515)
Micro-benchmark Improvement
-
- Add STREAM benchmark for sustainable memory bandwidth and the corresponding computation rate. (Adding Stream Benchmark #473)
-
- Add HPL Benchmark for HPC Linpack Benchmark. (Adding HPL benchmark #482)
-
- Support flexible warmup and non-random data initialization in cublas-benchmark (Benchmarks: Revision - Support flexible warmup and non-random data initialization in cublas-benchmark #479)
-
- Support error tolerance in micro-benchmark for CuDNN function (Benchmarks: Support error tolerance in micro-benchmark for CuDNN function #490, Benchmarks - Fix bug to get metric from cmd when error happens in cudnn #506)
-
- Add distributed inference benchmark (Benchmarks - Add distributed inference benchmark #493 and Revise Code - Fix wrong torch usage in communication wrapper for Distributed Inference Benchmark #505)
-
- Support tensor core precisions (e.g., FP8) and batch/shape range in cublaslt gemm (Benchmarks - Support tensor core precisions in cublaslt gemm #492, Benchmark - Support batch/shape range in cublaslt gemm #494, and Benchmark - Fix matrix size overflow issue in cuBLASLt GEMM #503)
Model Benchmark Improvement
-
- Fix torch.dist init issue with multiple models (Benchmark - Fix torch.dist init issue with multiple models #495)
-
- Support TE FP8 in BERT/GPT2 models (Benchmarks - Support TE FP8 in BERT/GPT2 models #496, Benchmark - Update TE FP8 model conversion #499)
-
- Add num_workers configurable in model benchmark (Add num_workers argument in model benchmark #511)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment