Skip to content

Commit

Permalink
[Inference] FP8 dual gemm auto-tune and support compile parallelizati…
Browse files Browse the repository at this point in the history
…on (#9151)

* fp8

* check

* check

* check

* check

* cutlass fp8

* fp8 chech

* check

* ffn1 tune

* delete

* check

* change file path

* top_p_sampling_reject.cu
  • Loading branch information
ckl117 authored Sep 20, 2024
1 parent 25a5b4f commit c4f7acf
Show file tree
Hide file tree
Showing 22 changed files with 1,000 additions and 1,010 deletions.
6 changes: 4 additions & 2 deletions csrc/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,11 @@ pip install -r requirements.txt

## 编译 Cuda 算子

生成 FP8的 cutlass 算子(编译耗时较长)
生成 FP8的 cutlass 算子
```shell
python generate_code_gemm_fused_kernels.py
python utils/auto_gen_fp8_fp8_gemm_fused_kernels.py

python utils/auto_gen_fp8_fp8_dual_gemm_fused_kernels.py
```

编译
Expand Down

This file was deleted.

This file was deleted.

Loading

0 comments on commit c4f7acf

Please sign in to comment.