Skip to content
This repository has been archived by the owner on Oct 11, 2024. It is now read-only.

Benchmarking : Fix server response aggregation #51

Merged

Conversation

varun-sundar-rabindranath

Summary:
In Streaming mode, the vllm server returns responses as soon as a token is available. However, it doesn't do it in parts, instead, each response is already an aggregate of all the previous responses. Therefore, it is sufficient to record just the last response.

Test:
Manual testing

Copy link
Member

@andy-neuma andy-neuma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool.

Comment on lines +46 to +49
def trim_suffix(text: str, suffix_str: str) -> str:
assert len(text) >= len(suffix_str)
assert text[-1 * len(suffix_str):] == suffix_str
return text[:-1 * len(suffix_str)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: why does this require that the suffix exists and fail if it doesn't? i would simplify it to this if the assertion isn't required

Suggested change
def trim_suffix(text: str, suffix_str: str) -> str:
assert len(text) >= len(suffix_str)
assert text[-1 * len(suffix_str):] == suffix_str
return text[:-1 * len(suffix_str)]
def trim_suffix(text: str, suffix_str: str) -> str:
if text.endswith(suffix_str):
return text[:-len(suffix_str)]
return text

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is out of the timing code and I think is good to keep around so we know we aren't trimming any actual data.

@varun-sundar-rabindranath varun-sundar-rabindranath merged commit 48748d9 into main Feb 26, 2024
2 checks passed
@varun-sundar-rabindranath varun-sundar-rabindranath deleted the varun/fix-benchmarking-backend-requests-script branch February 26, 2024 14:36
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants