-
Notifications
You must be signed in to change notification settings - Fork 80
Closed
Labels
Description
Release Manager
Endgame
- Code freeze: Aug. 30, 2024
- Bug Bash date: Sept. 02, 2024
- Release date: Oct. 09, 2024
Main Features
SuperBench Improvement
-
- Add CUDA 12.4 dockerfile (Dockerfile - Add CUDA 12.4 dockerfile #619)
-
- Improve document (Docs - fix typos #628 and Docs - Add BibTeX in README and repo #632)
-
- Update omegaconf version to 2.3.0 (Update omegaconf version to 2.3.0 #631)
-
- Fix MSCCL build error in CUDA12.4 docker build pipeline (CI/CD - Fix MSCCL build error in CUDA12.4 docker build pipeline #633)
-
- Update Docker Exec Command for Persistent HPCX Environment (Bug Fix - Update Docker Exec Command for Persistent HPCX Environment #635)
-
- Use types-setuptools to replace types-pkg_resources (Use
types-setuptoolsastypes-pkg_resourcesis Yanked #637)
- Use types-setuptools to replace types-pkg_resources (Use
-
- Update Docker Exec Command for Persistent HPCX Environment (Bug Fix - Update Docker Exec Command for Persistent HPCX Environment #635)
-
- Fix bug of failure test and warning of pandas in data diagnosis (Bug Fix: Data Diagnosis - Fix bug of failure test and warning of pandas in data diagnosis #638)
-
- Limit protobuf version to be 3.20.x (Limit protobuf version to be 3.20.x #645)
-
- Update hpcx link in cuda11.1 dockerfile to fix CI (Dockerfile - Update hpcx link in cuda11.1 dockerfile to fix CI #648)
-
- Upgrade nccl version and install ucx to fix bug in cuda 12.4 docker file (Dockerfile - upgrade nccl version and install ucx to fix bug in cuda 12.4 docker file #646)
-
- Add ROCm6.2 dockerfile (Dockerfile - Add ROCm6.2 dockerfile #647)
-
- Use identical metric names described in result-summary.md and micro-benchmarks.md (Docs - Fix metrics name in user tutorial #651)
-
- Support Azure H100 NDv5 configuration and AMD MI300 configuration (Benchmarks - Add configurations for NDv5 and AMD MI300 #652)
-
- Remove pytest and protobuf version constraint (<=7.4.4)
Micro-benchmark Improvement
-
- Add hipblasLt tuning to dist-inference cpp implementation (Benchmarks: Revise Code - Add hipblasLt tuning to dist-inference cpp implementation #616)
-
- Add support for NVIDIA L4/L40/L40s GPUs in gemm-flops (Benchmarks: Micro benchmarks - add support for NVIDIA L4/L40/L40s GPUs in gemm-flops #634)
-
- Upgrade mlc to v3.11 (Dockerfile - Upgrade mlc to v3.11 #620)
-
- Support cuDNN Backend API in cudnn-function.
Model Benchmark Improvement
- Support VGG, LSTM, and GPT-2 small in TensorRT Inference Backend
- Support VGG, LSTM, and GPT-2 small in ORT Inference Backend
- Support more TensorRT parameters (Related to TensorRT parameter passing can be enhanced #366)