Commit e15f7c9
authored
perf: fix the performance issue of
The performance of `append_paged_kv_cache` is terrible for small batch
size, which is a known issue that we haven't fixed for a long time, this
PR fixes it. This PR also adds support for non-contiguous append
keys/values (which could be sliced from fused qkv matrix).
We first call a triton kernel to convert `append_indptr` to
`batch_indices` and `positions` (which is similar to [CSR2COO
conversion](https://docs.nvidia.com/cuda/cusparse/#cusparse-t-csr2coo)
in sparse matrix). After the conversion, we can use element parallelism
instead of batch parallelism.
It's also worth trying using triton for the second
`AppendPagedKVCacheKernel` kernel, I think the performance should be
fine. I'll leave it for future work.
Some todo items:
1. add torch.compile support.
After this PR (reference number can be found at #583 ):
```bash
model: l1b seqlens: [1, 1, 1, 1, 1, 1, 1, 1] single_layer: 0.006ms all_layers: 0.094ms throughput: 5.563GB/s
model: l1b seqlens: [4993, 1, 1, 1, 1, 1, 1, 1] single_layer: 0.014ms all_layers: 0.216ms throughput: 1514.280GB/s
model: l1b seqlens: [5000] single_layer: 0.014ms all_layers: 0.216ms throughput: 1517.017GB/s
model: l1b seqlens: [625, 625, 625, 625, 625, 625, 625, 625] single_layer: 0.014ms all_layers: 0.217ms throughput: 1510.863GB/s
---
model: l3b seqlens: [1, 1, 1, 1, 1, 1, 1, 1] single_layer: 0.006ms all_layers: 0.165ms throughput: 11.123GB/s
model: l3b seqlens: [4993, 1, 1, 1, 1, 1, 1, 1] single_layer: 0.021ms all_layers: 0.580ms throughput: 1975.732GB/s
model: l3b seqlens: [5000] single_layer: 0.021ms all_layers: 0.586ms throughput: 1958.078GB/s
model: l3b seqlens: [625, 625, 625, 625, 625, 625, 625, 625] single_layer: 0.021ms all_layers: 0.581ms throughput: 1973.174GB/s
---
model: l8b seqlens: [1, 1, 1, 1, 1, 1, 1, 1] single_layer: 0.006ms all_layers: 0.185ms throughput: 11.321GB/s
model: l8b seqlens: [4993, 1, 1, 1, 1, 1, 1, 1] single_layer: 0.021ms all_layers: 0.661ms throughput: 1982.815GB/s
model: l8b seqlens: [5000] single_layer: 0.021ms all_layers: 0.662ms throughput: 1980.227GB/s
model: l8b seqlens: [625, 625, 625, 625, 625, 625, 625, 625] single_layer: 0.021ms all_layers: 0.667ms throughput: 1964.861GB/s
---
model: l70b-tp8 seqlens: [1, 1, 1, 1, 1, 1, 1, 1] single_layer: 0.006ms all_layers: 0.457ms throughput: 1.434GB/s
model: l70b-tp8 seqlens: [4993, 1, 1, 1, 1, 1, 1, 1] single_layer: 0.009ms all_layers: 0.710ms throughput: 576.866GB/s
model: l70b-tp8 seqlens: [5000] single_layer: 0.009ms all_layers: 0.685ms throughput: 598.366GB/s
model: l70b-tp8 seqlens: [625, 625, 625, 625, 625, 625, 625, 625] single_layer: 0.009ms all_layers: 0.690ms throughput: 593.453GB/s
```
cc @abcdabcd987append_paged_kv_cache (#588)1 parent 1328693 commit e15f7c9
File tree
9 files changed
+285
-93
lines changed- benchmarks
- include/flashinfer
- python
- csrc_aot
- csrc
- flashinfer
- src
- tests
9 files changed
+285
-93
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
99 | 99 | | |
100 | 100 | | |
101 | 101 | | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
102 | 108 | | |
103 | 109 | | |
104 | 110 | | |
105 | 111 | | |
106 | 112 | | |
107 | | - | |
| 113 | + | |
| 114 | + | |
108 | 115 | | |
109 | 116 | | |
110 | 117 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
249 | 249 | | |
250 | 250 | | |
251 | 251 | | |
252 | | - | |
| 252 | + | |
| 253 | + | |
253 | 254 | | |
254 | 255 | | |
255 | | - | |
256 | | - | |
257 | | - | |
| 256 | + | |
| 257 | + | |
| 258 | + | |
| 259 | + | |
| 260 | + | |
| 261 | + | |
| 262 | + | |
258 | 263 | | |
259 | 264 | | |
260 | | - | |
261 | 265 | | |
262 | | - | |
263 | | - | |
264 | | - | |
265 | | - | |
266 | | - | |
267 | | - | |
268 | | - | |
269 | | - | |
270 | | - | |
271 | | - | |
272 | | - | |
273 | | - | |
274 | | - | |
| 266 | + | |
| 267 | + | |
| 268 | + | |
| 269 | + | |
| 270 | + | |
| 271 | + | |
| 272 | + | |
| 273 | + | |
275 | 274 | | |
276 | 275 | | |
277 | 276 | | |
278 | | - | |
279 | | - | |
280 | | - | |
| 277 | + | |
281 | 278 | | |
282 | | - | |
283 | | - | |
| 279 | + | |
284 | 280 | | |
285 | 281 | | |
286 | 282 | | |
| |||
327 | 323 | | |
328 | 324 | | |
329 | 325 | | |
330 | | - | |
331 | | - | |
| 326 | + | |
| 327 | + | |
| 328 | + | |
| 329 | + | |
| 330 | + | |
332 | 331 | | |
333 | | - | |
334 | 332 | | |
| 333 | + | |
| 334 | + | |
| 335 | + | |
| 336 | + | |
| 337 | + | |
| 338 | + | |
335 | 339 | | |
336 | 340 | | |
337 | 341 | | |
338 | 342 | | |
339 | | - | |
340 | | - | |
| 343 | + | |
| 344 | + | |
| 345 | + | |
| 346 | + | |
| 347 | + | |
| 348 | + | |
| 349 | + | |
341 | 350 | | |
342 | | - | |
343 | | - | |
| 351 | + | |
| 352 | + | |
| 353 | + | |
| 354 | + | |
| 355 | + | |
344 | 356 | | |
345 | 357 | | |
346 | 358 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
16 | 16 | | |
17 | 17 | | |
18 | 18 | | |
19 | | - | |
20 | | - | |
21 | | - | |
22 | | - | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
23 | 23 | | |
24 | 24 | | |
25 | 25 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
20 | 20 | | |
21 | 21 | | |
22 | 22 | | |
23 | | - | |
24 | | - | |
25 | | - | |
26 | | - | |
27 | | - | |
28 | | - | |
29 | | - | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
30 | 31 | | |
31 | 32 | | |
32 | 33 | | |
| |||
35 | 36 | | |
36 | 37 | | |
37 | 38 | | |
38 | | - | |
| 39 | + | |
| 40 | + | |
39 | 41 | | |
40 | 42 | | |
41 | 43 | | |
42 | 44 | | |
43 | 45 | | |
| 46 | + | |
44 | 47 | | |
45 | | - | |
46 | 48 | | |
47 | | - | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
48 | 53 | | |
49 | 54 | | |
50 | 55 | | |
51 | | - | |
| 56 | + | |
52 | 57 | | |
53 | 58 | | |
54 | 59 | | |
| |||
76 | 81 | | |
77 | 82 | | |
78 | 83 | | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
79 | 91 | | |
80 | 92 | | |
81 | 93 | | |
82 | | - | |
| 94 | + | |
83 | 95 | | |
84 | 96 | | |
85 | 97 | | |
| |||
92 | 104 | | |
93 | 105 | | |
94 | 106 | | |
95 | | - | |
96 | | - | |
97 | | - | |
98 | | - | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
99 | 113 | | |
100 | 114 | | |
101 | 115 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
80 | 80 | | |
81 | 81 | | |
82 | 82 | | |
83 | | - | |
84 | | - | |
85 | | - | |
86 | | - | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
87 | 87 | | |
88 | 88 | | |
89 | 89 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
46 | 46 | | |
47 | 47 | | |
48 | 48 | | |
| 49 | + | |
| 50 | + | |
49 | 51 | | |
50 | 52 | | |
51 | 53 | | |
| |||
0 commit comments