Skip to content

[Misc]: a question about chunked-prefill in flash-attn backends #4863

Closed as not planned
@HarryWu99

Description

Anything you want to discuss about vllm.

if prefill_meta := attn_metadata.prefill_metadata:

I noticed that in flash-attn backends. forward_prefix and forward_decode seem to be executed serially. Does forward_decode wait for forward_prefix to finish before running? Can this take advantage of the performance provided by chunked-prefill? I mean the tokens of prefill are in the same batch as the tokens of decode.

if prefill_meta := attn_metadata.prefill_metadata:
    output[:num_prefill_tokens] = PagedAttention.forward_prefix(...)

if decode_meta := attn_metadata.decode_metadata:
    output[num_prefill_tokens:] = PagedAttention.forward_decode(...)

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions