Skip to content

Commit

Permalink
[Inference] FP8 gemm auto-tune (#9094)
Browse files Browse the repository at this point in the history
* fp8 cutlass gemm tune

* git ignore third_party

* check csrc/readme.md
  • Loading branch information
ckl117 authored Sep 11, 2024
1 parent 73a3db9 commit 3675ea2
Show file tree
Hide file tree
Showing 23 changed files with 1,850 additions and 1,033 deletions.
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -126,6 +126,6 @@ FETCH_HEAD
./ppdiffusers/ppdiffusers/version.py

# third party
csrc/gpu/cutlass_kernels/cutlass
csrc/third_party/
dataset/
output/
13 changes: 12 additions & 1 deletion csrc/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,12 @@ pip install -r requirements.txt

## 编译 Cuda 算子

生成 FP8的 cutlass 算子(编译耗时较长)
```shell
python generate_code_gemm_fused_kernels.py
```

编译
```shell
python setup_cuda.py install
```
Expand All @@ -20,9 +26,14 @@ python setup_cuda.py install
2. 拉取代码:
git clone -b v3.5.0 --single-branch https://github.com/NVIDIA/cutlass.git

3. 将下载的 `cutlass` 目录放在 `csrc/gpu/cutlass_kernels/cutlass`
3. 将下载的 `cutlass` 目录放在 `csrc/third_party/cutlass`

4. 重新编译 Cuda 算子
```shell
python setup_cuda.py install
```

### FP8 GEMM 自动调优
```shell
sh tune_fp8_gemm.sh
```
Loading

0 comments on commit 3675ea2

Please sign in to comment.