Skip to content

Commit 08c0bf8

Browse files
committed
typo
1 parent 5ddcd7f commit 08c0bf8

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

examples/attention_sink/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ We compare with an optimized version of the official Triton implementation at [h
88
The only change from vanilla FlashAttention is that `sinks` should be taken into consideration in the softmax, which requires an extra rescaling at the epilogue stage.
99

1010
### Backward
11-
Based on detailed mathematical derivation, interestingly, the backward computation process of `dQ`, `dK`, `dv` is almost identical to that in vanilla FlashAttention, except for that the specific meanings of `lse` differ. We only need to compute `dsinks`, which is given by:
11+
Based on detailed mathematical derivation, interestingly, the backward computation process of `dQ`, `dK`, `dv` is almost identical to that in vanilla FlashAttention, except for that the specific meanings of `lse` differ. We only need to compute `dsinks` additionally, which is given by:
1212

1313
$$
1414
dsink_h=-\sum_{b}\sum_{q}P_{b, h, q}Delta_{b, h, q}
@@ -29,7 +29,7 @@ where $P_{b, h, q}$ is the proportion of $sink_h$ in the softmax in the $b$-th b
2929
- batch_size=1, heads=64, kv_heads=8 (the setting of GPT-OSS-120B)
3030
- Full attention is adopted.
3131

32-
| SEQ_LEN | headdim | Triton TFLOPS | TileLang TFLOPS | Speedup |
32+
| SEQ_LEN | headdim | Triton TFLOPs | TileLang TFLOPs | Speedup |
3333
|---------|---------|---------------|----------------------|---------|
3434
| 2048 | 64 | 231.55 | **277.07** | 1.20x |
3535
| 2048 | 128 | 313.55 | **393.98** | 1.26x |

0 commit comments

Comments
 (0)