Commit 273c949
Faster Custom Paged Attention kernels (#372)
* integrate new cpa kernel, update tests and benchmark
* added comments to mfma4 kernel
* further comments for mfma16 kernel
* clang-format
* Lint
* add flag for logits rtz conversion and disable by default
* lint
* [Bugfix]: Fix paged attention unit tests of #372 (#389)
* [Bugfix]: fix paged attention tests based on the updated kernels in `csrc/attention/paged_attention_v1.cu`,`csrc/attention/paged_attention_v2.cu` and `csrc/rocm/attention.cu`.
* improve code documentation.
* lint
---------
Co-authored-by: vllmellm <vllm.ellm@embeddedllm.com>
---------
Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com>
Co-authored-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Co-authored-by: Joe Shajrawi <17753158+shajrawi@users.noreply.github.com>
Co-authored-by: TJian <tunjian1996@gmail.com>
Co-authored-by: vllmellm <vllm.ellm@embeddedllm.com>1 parent 7a292f9 commit 273c949
File tree
3 files changed
+1016
-402
lines changed- benchmarks/kernels
- csrc/rocm
- tests/kernels
3 files changed
+1016
-402
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
9 | 9 | | |
10 | 10 | | |
11 | 11 | | |
12 | | - | |
| 12 | + | |
13 | 13 | | |
| 14 | + | |
14 | 15 | | |
15 | 16 | | |
16 | 17 | | |
| |||
78 | 79 | | |
79 | 80 | | |
80 | 81 | | |
81 | | - | |
| 82 | + | |
82 | 83 | | |
83 | | - | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
84 | 88 | | |
85 | 89 | | |
86 | 90 | | |
| |||
163 | 167 | | |
164 | 168 | | |
165 | 169 | | |
| 170 | + | |
| 171 | + | |
166 | 172 | | |
167 | 173 | | |
168 | 174 | | |
| |||
176 | 182 | | |
177 | 183 | | |
178 | 184 | | |
179 | | - | |
| 185 | + | |
180 | 186 | | |
181 | 187 | | |
182 | 188 | | |
183 | 189 | | |
184 | 190 | | |
185 | | - | |
| 191 | + | |
186 | 192 | | |
187 | 193 | | |
188 | 194 | | |
| |||
0 commit comments