Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CK_TILE] Tune fmha fwd splitkv codgen #110

Conversation

poyenc
Copy link

@poyenc poyenc commented Dec 17, 2024

  1. Add instances to enable vector load on hdim_q/hdim_v
  2. Use larger tile size (kM0) for chunked prefill (group + paged-kvcache)
  3. Update num_splits heuristic (determine # workgroup base on the prefill/decode phase)

see CK PR for more info.

@poyenc poyenc requested a review from rocking5566 December 17, 2024 19:24
@poyenc poyenc self-assigned this Dec 17, 2024
@poyenc poyenc changed the title Tune fmha fwd splitkv codgen [CK_TILE] Tune fmha fwd splitkv codgen Dec 17, 2024
{
int device;
auto status = hipGetDevice(&device);
if(status != hipSuccess)
{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

follow FA coding style
if(status != hipSuccess) { return num_splits; }


hipDeviceProp_t props{};
status = hipGetDeviceProperties(&props, device);
if(status != hipSuccess)
{
return num_splits;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if(status != hipSuccess) { return num_splits; }

// get kM0 for prefill phase
if(is_prefill)
{
return 128;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

coding style

};

for(auto [hdim, m0] : hdim_to_m0)
{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

coding style

{
if(hdim_q <= hdim && hdim_v <= hdim)
{
return m0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

coding style


if(num_splits < 1 && p_drop == 0.0f)
return num_splits_heuristic_ck(
batch * nhead * num_m_blocks, props.multiProcessorCount * 2, num_n_blocks, 128);
{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

coding style

if (batch_nheads_mblocks >= 0.8f * num_SMs) { return 1; }
max_splits = std::min({max_splits, num_SMs, num_n_blocks});
if(batch_nhead_mblocks >= 0.8f * num_SMs)
{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

coding style

std::array<float, num_splits_array.size()> efficiency;

for(size_t idx = 0; idx < num_splits_array.size() && num_splits_array[idx] <= max_splits; ++idx)
{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

coding style

float eff = n_blocks / std::ceil(n_blocks);

if(eff > max_efficiency)
{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

max_seqlen_q

// printf("num_splits chosen = %d\n", num_splits);
return num_splits;
for(size_t idx = 0; idx < num_splits_array.size() && num_splits_array[idx] <= max_splits; ++idx)
{
Copy link
Collaborator

@rocking5566 rocking5566 Dec 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

coding style

@poyenc poyenc marked this pull request as draft December 19, 2024 08:18
@poyenc poyenc force-pushed the ck_tile/vllm-layout-varlen-add-splitkv-instance branch from c44f7a7 to ddcc375 Compare December 23, 2024 15:32
@poyenc poyenc closed this Dec 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants