Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[V1] Implement Cascade Attention #11635

Merged
merged 23 commits into from
Jan 1, 2025
Prev Previous commit
Next Next commit
minor
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
  • Loading branch information
WoosukKwon committed Jan 1, 2025
commit 42efe0d1deb5e1b59130504ad55ca9531dbd4af4
3 changes: 2 additions & 1 deletion vllm/v1/core/kv_cache_manager.py
Original file line number Diff line number Diff line change
Expand Up @@ -310,7 +310,8 @@ def get_num_common_prefix_blocks(
request: Any request in the RUNNING state, used to identify the
common prefix blocks.
num_running_requests: The total number of requests in the RUNNING
state.
state. This can be different from the number of scheduled
requests in the current step.

Returns:
int: The number of common prefix blocks.
Expand Down
Loading