[CK_TILE] Tune fmha fwd splitkv codgen #110

poyenc · 2024-12-17T19:24:06Z

Add instances to enable vector load on hdim_q/hdim_v
Use larger tile size (kM0) for chunked prefill (group + paged-kvcache)
Update num_splits heuristic (determine # workgroup base on the prefill/decode phase)

see CK PR for more info.

rocking5566 · 2024-12-17T20:16:25Z

csrc/flash_attn_ck/flash_common.cpp

 {
    int device;
    auto status = hipGetDevice(&device);
    if(status != hipSuccess)
+    {


follow FA coding style
if(status != hipSuccess) { return num_splits; }

rocking5566 · 2024-12-17T20:17:06Z

csrc/flash_attn_ck/flash_common.cpp


    hipDeviceProp_t props{};
    status = hipGetDeviceProperties(&props, device);
    if(status != hipSuccess)
+    {
        return num_splits;


if(status != hipSuccess) { return num_splits; }

rocking5566 · 2024-12-17T20:17:27Z

csrc/flash_attn_ck/flash_common.cpp

+        // get kM0 for prefill phase
+        if(is_prefill)
+        {
+            return 128;


coding style

rocking5566 · 2024-12-17T20:17:48Z

csrc/flash_attn_ck/flash_common.cpp

+        };
+
+        for(auto [hdim, m0] : hdim_to_m0)
+        {


coding style

rocking5566 · 2024-12-17T20:17:55Z

csrc/flash_attn_ck/flash_common.cpp

+        {
+            if(hdim_q <= hdim && hdim_v <= hdim)
+            {
+                return m0;


coding style

rocking5566 · 2024-12-17T20:19:18Z

csrc/flash_attn_ck/flash_common.cpp


    if(num_splits < 1 && p_drop == 0.0f)
-        return num_splits_heuristic_ck(
-            batch * nhead * num_m_blocks, props.multiProcessorCount * 2, num_n_blocks, 128);
+    {


coding style

rocking5566 · 2024-12-17T20:19:35Z

csrc/flash_attn_ck/flash_common.hpp

-    if (batch_nheads_mblocks >= 0.8f * num_SMs) { return 1; }
-    max_splits = std::min({max_splits, num_SMs, num_n_blocks});
+    if(batch_nhead_mblocks >= 0.8f * num_SMs)
+    {


coding style

rocking5566 · 2024-12-17T20:20:13Z

csrc/flash_attn_ck/flash_common.hpp

+    std::array<float, num_splits_array.size()> efficiency;
+
+    for(size_t idx = 0; idx < num_splits_array.size() && num_splits_array[idx] <= max_splits; ++idx)
+    {


coding style

rocking5566 · 2024-12-17T20:21:06Z

csrc/flash_attn_ck/flash_common.hpp

+        float eff      = n_blocks / std::ceil(n_blocks);
+
+        if(eff > max_efficiency)
+        {


max_seqlen_q

rocking5566 · 2024-12-17T20:21:15Z

csrc/flash_attn_ck/flash_common.hpp

-            // printf("num_splits chosen = %d\n", num_splits);
-            return num_splits;
+    for(size_t idx = 0; idx < num_splits_array.size() && num_splits_array[idx] <= max_splits; ++idx)
+    {


coding style

This reverts commit 5804c42.

Update num_splits heuristics

e5c5435

poyenc requested a review from rocking5566 December 17, 2024 19:24

poyenc self-assigned this Dec 17, 2024

poyenc changed the title ~~Tune fmha fwd splitkv codgen~~ [CK_TILE] Tune fmha fwd splitkv codgen Dec 17, 2024

Update CK changes

89a144c

rocking5566 reviewed Dec 17, 2024

View reviewed changes

csrc/flash_attn_ck/flash_common.cpp

// get kM0 for prefill phase

if(is_prefill)

{

return 128;

Copy link

Collaborator

rocking5566 Dec 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

coding style

rocking5566 reviewed Dec 17, 2024

View reviewed changes

csrc/flash_attn_ck/flash_common.cpp

};

for(auto [hdim, m0] : hdim_to_m0)

{

Copy link

Collaborator

rocking5566 Dec 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

coding style

rocking5566 reviewed Dec 17, 2024

View reviewed changes

csrc/flash_attn_ck/flash_common.cpp

{

if(hdim_q <= hdim && hdim_v <= hdim)

{

return m0;

Copy link

Collaborator

rocking5566 Dec 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

coding style

rocking5566 reviewed Dec 17, 2024

View reviewed changes

csrc/flash_attn_ck/flash_common.hpp

float eff = n_blocks / std::ceil(n_blocks);

if(eff > max_efficiency)

{

Copy link

Collaborator

rocking5566 Dec 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

max_seqlen_q

rocking5566 reviewed Dec 17, 2024

View reviewed changes

poyenc marked this pull request as draft December 19, 2024 08:18

poyenc added 2 commits December 19, 2024 08:32

Use experimental branch instead

f8ffd2b

Include splitkv optimizations

ddcc375

poyenc force-pushed the ck_tile/vllm-layout-varlen-add-splitkv-instance branch from c44f7a7 to ddcc375 Compare December 23, 2024 15:32

poyenc added 5 commits December 23, 2024 14:33

Fix wrong V layout used for splitkv kernel

5804c42

Update codegen logic

469f7c8

Revert "Fix wrong V layout used for splitkv kernel"

230c513

This reverts commit 5804c42.

Sync new fwd splitkv codegen logics

37dffb0

Use vector load if paged-vcache is in column major

ff0bcf4

poyenc closed this Dec 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CK_TILE] Tune fmha fwd splitkv codgen #110

[CK_TILE] Tune fmha fwd splitkv codgen #110

poyenc commented Dec 17, 2024

rocking5566 Dec 17, 2024

rocking5566 Dec 17, 2024

rocking5566 Dec 17, 2024

rocking5566 Dec 17, 2024

rocking5566 Dec 17, 2024

rocking5566 Dec 17, 2024

rocking5566 Dec 17, 2024

rocking5566 Dec 17, 2024

rocking5566 Dec 17, 2024

rocking5566 Dec 17, 2024 •

edited

Loading

[CK_TILE] Tune fmha fwd splitkv codgen #110

[CK_TILE] Tune fmha fwd splitkv codgen #110

Conversation

poyenc commented Dec 17, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rocking5566 Dec 17, 2024 • edited Loading

Choose a reason for hiding this comment

rocking5566 Dec 17, 2024 •

edited

Loading