Skip to content
This repository has been archived by the owner on Oct 11, 2024. It is now read-only.

Commit

Permalink
[Bugfix] fix missing last itl in openai completions benchmark (vllm-p…
Browse files Browse the repository at this point in the history
  • Loading branch information
mcalman authored and robertgshaw2-neuralmagic committed Jul 1, 2024
1 parent f281c2e commit b89416e
Showing 1 changed file with 5 additions and 6 deletions.
11 changes: 5 additions & 6 deletions benchmarks/backend_request_func.py
Original file line number Diff line number Diff line change
Expand Up @@ -265,6 +265,9 @@ async def async_request_openai_completions(
else:
data = json.loads(chunk)

# NOTE: Some completion API might have a last
# usage summary response without a token so we
# want to check a token was generated
if data["choices"][0]["text"]:
timestamp = time.perf_counter()
# First token
Expand All @@ -273,12 +276,8 @@ async def async_request_openai_completions(
output.ttft = ttft

# Decoding phase
# NOTE: Some completion API might have a last
# usage summary response without a token so we
# do not want to include as inter-token-latency
elif data.get("usage", None) is None:
output.itl.append(timestamp -
most_recent_timestamp)
output.itl.append(timestamp -
most_recent_timestamp)

most_recent_timestamp = timestamp
generated_text += data["choices"][0]["text"]
Expand Down

0 comments on commit b89416e

Please sign in to comment.