Skip to content

Commit

Permalink
[Bugfix] Include encoder prompts len to non-stream api usage response (
Browse files Browse the repository at this point in the history
…vllm-project#8861)

Signed-off-by: Sumit Dubey <sumit.dubey2@ibm.com>
  • Loading branch information
Pernekhan authored and sumitd2 committed Nov 14, 2024
1 parent 8abaf98 commit 79ad521
Showing 1 changed file with 2 additions and 0 deletions.
2 changes: 2 additions & 0 deletions vllm/entrypoints/openai/serving_chat.py
Original file line number Diff line number Diff line change
Expand Up @@ -726,6 +726,8 @@ async def chat_completion_full_generator(

assert final_res.prompt_token_ids is not None
num_prompt_tokens = len(final_res.prompt_token_ids)
if final_res.encoder_prompt_token_ids is not None:
num_prompt_tokens += len(final_res.encoder_prompt_token_ids)
num_generated_tokens = sum(
len(output.token_ids) for output in final_res.outputs)
usage = UsageInfo(
Expand Down

0 comments on commit 79ad521

Please sign in to comment.