Skip to content

[RFC]: Prompt logprobs + APC compatibility #13414

Closed
@afeldman-nm

Description

@afeldman-nm

Motivation.

Porting logprobs support to v1 was key for completeness. APC is an important performance optimization. #9880 adds sample and prompt logprobs support, however prompt logprobs currently require the server to be instantiated with --no-enable-prefix-caching; otherwise, a request with prompt_logprobs=true will cause the request to fail with the message "Prefix caching with prompt logprobs not yet supported on VLLM V1."

The challenge of using prompt logprobs alongside APC is how to recover the topk prompt logprobs from an APC cache hit. The existing APC implementation does not cache prompt logprobs; upon a cache hit, cached blocks are treated as "computed" & no prompt logprobs are available for the computed blocks.

Proposed Choices for Implementation

  1. Use APC cached KVs to recompute prompt logprobs if a request with prompt_logprobs=true triggers an APC cache hit. This requires model code and model_executor code to support re-running prefill using cached KVs.
  2. Cache prompt logprobs in the APC. The problem with this solution is that a request which triggers an APC cache hit may require a greater number of topk prompt logprobs than the request which filled the cache, in which case recomputation would be necessary anyway.
  3. Bypass APC for requests with prompt_logprobs=true. Requests with prompt_logprobs=true cannot exploit APC cache. This is the simplest solution but incurs a performance penalty.

Feedback Period.

One week from 2/17

CC List.

@robertgshaw2-redhat @WoosukKwon @njhill

Any Other Things.

No response

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions