[Inference] FP8 gemm auto-tune (#9094)

* fp8 cutlass gemm tune * git ignore third_party * check csrc/readme.md
PaddlePaddle · Sep 11, 2024 · 3675ea2 · 3675ea2
1 parent 73a3db9
commit 3675ea2
Show file tree

Hide file tree

Showing 23 changed files with 1,850 additions and 1,033 deletions.
diff --git a/.gitignore b/.gitignore
@@ -126,6 +126,6 @@ FETCH_HEAD
 ./ppdiffusers/ppdiffusers/version.py
 
 # third party
-csrc/gpu/cutlass_kernels/cutlass
+csrc/third_party/
 dataset/
 output/
diff --git a/csrc/README.md b/csrc/README.md
@@ -10,6 +10,12 @@ pip install -r requirements.txt
 
 ## 编译 Cuda 算子
 
+生成 FP8的 cutlass 算子(编译耗时较长)
+```shell
+python generate_code_gemm_fused_kernels.py
+```
+
+编译
 ```shell
 python setup_cuda.py install
 ```
@@ -20,9 +26,14 @@ python setup_cuda.py install
 2. 拉取代码:
     git clone -b v3.5.0 --single-branch https://github.com/NVIDIA/cutlass.git
 
-3. 将下载的 `cutlass` 目录放在 `csrc/gpu/cutlass_kernels/cutlass`下
+3. 将下载的 `cutlass` 目录放在 `csrc/third_party/cutlass`下
 
 4. 重新编译 Cuda 算子
 ```shell
 python setup_cuda.py install
 ```
+
+### FP8 GEMM 自动调优
+```shell
+sh tune_fp8_gemm.sh
+```