Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add script for benchmarking serving throughput #145

Merged
merged 43 commits into from
Jun 15, 2023
Merged
Changes from 1 commit
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
473c5b8
Minor fix
WoosukKwon Jun 10, 2023
a644a9b
Minor
WoosukKwon Jun 10, 2023
67ed51c
Minor
WoosukKwon Jun 10, 2023
83acd5e
Minor
WoosukKwon Jun 10, 2023
4957281
Add log-requests option to AsyncLLMServer
WoosukKwon Jun 10, 2023
c6b38d2
[WIP] Add benchmark_serving.py
WoosukKwon Jun 10, 2023
5210de0
Minor
WoosukKwon Jun 10, 2023
d4df348
Delete unused files
WoosukKwon Jun 10, 2023
fab12d6
Minor
WoosukKwon Jun 10, 2023
3ddadf4
Add docstring
WoosukKwon Jun 10, 2023
4269b11
Bugfix
WoosukKwon Jun 10, 2023
af8974d
Minor
WoosukKwon Jun 10, 2023
f8dee6e
Minor
WoosukKwon Jun 10, 2023
d181f10
Add script to launch HF server
WoosukKwon Jun 10, 2023
fc02a02
Add HF backend
WoosukKwon Jun 10, 2023
99d9ce3
Minor
WoosukKwon Jun 10, 2023
bc9ec63
Bugfix
WoosukKwon Jun 10, 2023
9477f2f
Filter out long prompts
WoosukKwon Jun 10, 2023
51a5332
Minor fix
WoosukKwon Jun 10, 2023
6b0d77b
Merge branch 'main' into benchmark-llama
WoosukKwon Jun 10, 2023
00d158d
Repeat failed requests
WoosukKwon Jun 10, 2023
0c55c40
Stream=False
WoosukKwon Jun 10, 2023
bcb8e16
Minor
WoosukKwon Jun 10, 2023
6a7baaa
Prune short sequences
WoosukKwon Jun 10, 2023
071b4aa
Add 1 hour timeout
WoosukKwon Jun 10, 2023
983cf97
Increase timeout
WoosukKwon Jun 10, 2023
b55b1ee
Add shortcut
WoosukKwon Jun 11, 2023
c45a2dd
Simplify
WoosukKwon Jun 11, 2023
66f8c60
Merge branch 'opt' into benchmark-llama
WoosukKwon Jun 11, 2023
a1b513e
n -> best_of
WoosukKwon Jun 11, 2023
72d6a63
Minor
WoosukKwon Jun 11, 2023
44bc461
Add latency stats
WoosukKwon Jun 11, 2023
6990fc5
Increase max_best_of in HF server
WoosukKwon Jun 11, 2023
2c610bd
Merge branch 'main' into benchmark-llama
WoosukKwon Jun 11, 2023
5687f10
hf -> tgi
WoosukKwon Jun 13, 2023
672fbbd
Add HF backend
WoosukKwon Jun 13, 2023
60bccc4
Fix batching
WoosukKwon Jun 13, 2023
b7fcade
Fix a bug & Add tqdm
WoosukKwon Jun 13, 2023
6accbfd
Minor
WoosukKwon Jun 14, 2023
c7360d1
Fix
WoosukKwon Jun 15, 2023
bf1bae6
Comment
WoosukKwon Jun 15, 2023
7bebe29
Add docstring
WoosukKwon Jun 15, 2023
5c1b852
Comment
WoosukKwon Jun 15, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Fix
  • Loading branch information
WoosukKwon committed Jun 15, 2023
commit c7360d13db1f5b7b5eabbc9a011f685841ed18a9
7 changes: 2 additions & 5 deletions benchmarks/benchmark_serving.py
Original file line number Diff line number Diff line change
Expand Up @@ -88,11 +88,8 @@ async def get_request(
request_rate: float,
) -> AsyncGenerator[Tuple[str, int, int], None]:
input_requests = iter(input_requests)
while True:
try:
yield next(input_requests)
except StopIteration:
return
for request in input_requests:
yield request

if request_rate == float("inf"):
# If the request rate is infinity, then we don't need to wait.
Expand Down