Commit d5bf9bc
Add BF16 support to custom PA (#133)
* tightened atol for custom PA; enable supported head size, block sizes in testing
* update num_blocks and num_iters in benchmark PA to realistic settings
* move to generic b16 type
* bf16 first port
* enabled all bf16 tests, set atol for bf16
* enable custom PA for bf16 as well as block size 32 and head size 64
* fix cast to zero in custom PA reduce
* py linter fixes
* clang format fixes
* div round up clang-format
---------
Co-authored-by: Charlie Fu <Charlie.Fu@amd.com>
Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com>1 parent 636ff01 commit d5bf9bc
File tree
4 files changed
+271
-157
lines changed- benchmarks/kernels
- csrc/custom/paged_attention
- tests/kernels
- vllm/attention/ops
4 files changed
+271
-157
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
9 | 9 | | |
10 | 10 | | |
11 | 11 | | |
12 | | - | |
| 12 | + | |
13 | 13 | | |
14 | 14 | | |
15 | 15 | | |
| |||
176 | 176 | | |
177 | 177 | | |
178 | 178 | | |
179 | | - | |
| 179 | + | |
180 | 180 | | |
181 | 181 | | |
182 | 182 | | |
| |||
0 commit comments