Optimize flash varlen paged KV cache addressing by yysheng26 · Pull Request #3564 · flagos-ai/FlagGems

yysheng26 · 2026-05-28T08:55:53Z

PR Category

Benchmark

Type of Change

Performance Optimization

Description

Optimize paged KV cache address calculation in flash_attn_varlen_func.

This PR adds a contiguous KV cache fastpath and lets k_page_stride participate in Triton kernel specialization. For contiguous paged KV cache, the kernel uses the simpler row-id based offset formula. For non-contiguous paged KV cache, specializing k_page_stride avoids the slow generic runtime-stride address path.

A non-contiguous KV cache benchmark is also added to cover the case where k.stride(0) != block_size * k.stride(-3).

Issue

Associated with PR #3410.

Progress

Change is properly reviewed (1 reviewer required, 2 recommended).
Change is responded to an issue.
Change is fully covered by a UT.

Performance

Benchmark command:

pytest benchmark/test_flash_attn_varlen_func.py -v -s

Long paged KV case, fp16:

Contiguous: 0.797 ms
Non-contiguous before: ~1.62 ms
Non-contiguous after: ~0.80 ms

The non-contiguous long case is now close to the contiguous fastpath performance.

0x45f · 2026-05-28T09:10:20Z

        ).to(tl.int64)
    else:
        page_block_index = tl.load(page_table_ptr + virtual_page_index).to(tl.int64)
+    if IS_CONTIGUOUS_KVCACHE:


after we remove k_page_stride from do_not_specialize list, are we still need this if IS_CONTIGUOUS_KVCACHE ?

Optimize flash varlen paged KV cache addressing

805c0a4

yysheng26 requested review from 0x45f, bin913, douxetpur, huangyiqun and w1120029931-bit as code owners May 28, 2026 08:55

github-actions Bot added benchmark ops/aten tests size/Small labels May 28, 2026

0x45f self-assigned this May 28, 2026

0x45f reviewed May 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize flash varlen paged KV cache addressing#3564

Optimize flash varlen paged KV cache addressing#3564
yysheng26 wants to merge 1 commit into
flagos-ai:masterfrom
yysheng26:optimize/flash-varlen-k-page-stride

yysheng26 commented May 28, 2026

Uh oh!

0x45f May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

yysheng26 commented May 28, 2026

PR Category

Type of Change

Description

Issue

Progress

Performance

Uh oh!

0x45f May 28, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants