Skip to content

[Bug]: Async KVConnectors can break shared prefix check #23130

@njhill

Description

@njhill

🐛 Describe the bug

KVCacheManager.get_num_common_prefix_blocks is used to identify the longest common prefix of requests in the current batch (actually all RUNNING requests), which in turn is used to determine whether to use cascade attention.

The logic currently assumes each ref count on a block corresponds to a running request, however with async kv offloading, ref counts can be held after a request has completed and is no longer in running state.

This means the logic could incorrectly identify a common prefix that isn't actually shared by all batch constituents.

With forthcoming changes to the KVConnector API there may be other situations that connectors hold their own reference to blocks.

Metadata

Metadata

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions