New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

adding online benchmarking scripts #55

Merged

tstescoTT merged 42 commits into main from tstesco/online-benchmark

Dec 31, 2024

Contributor

tstescoTT commented Dec 12, 2024 •

edited

Loading

change log

add utils/prompt_client.p::PromptClient as vLLM client handling authentication and health checks
addres trace capture: vLLM run script prefill + decode trace pre-capture to avoid TTFT on first completions being unexpectedly high or stalling #56
improve prompt generation and handling with utils/prompt_configs.py and utils/batch_processor.py
remove explictly setting stop token in prompt client, this causes issues with instruct models correctly configured with instruct tokenizer
add trace capturing ahead of performance measurement in benchmarking scripts
add online benchmarking script using vllm/benchmarks/benchmark_serving.py
add vllm benchmarking patch at benchmarking/benchmark_serving.patch handling best_of which is unsupported in current Tenstorrent vllm fork
add benchmarking/prompt_client_online_benchmark.py to measure performance with different batch handling
update benchmarking docs
update prompt CLI and util docs
update mock model to be faster and not send stop tokens unexpected
add benchmarking, evals, and tests to Docker image vllm-tt-metal-llama3-70b/vllm.llama3.src.Dockerfile

tstescoTT requested review from mvanniasingheTT and milank94

December 12, 2024 05:25

tstescoTT force-pushed the tstesco/online-benchmark branch from b66bc5a to af1325f Compare

December 12, 2024 05:26

milank94 reviewed

View reviewed changes

benchmarking/README.md Show resolved Hide resolved

milank94 reviewed

View reviewed changes

benchmarking/README.md Show resolved Hide resolved

milank94 reviewed

View reviewed changes

benchmarking/README.md Show resolved Hide resolved

milank94 reviewed

View reviewed changes

benchmarking/vllm_online_benchmark.py Show resolved Hide resolved

milank94 reviewed

View reviewed changes

utils/batch_processor.py Outdated

Contributor

milank94 Dec 13, 2024

Can you provide a description of what this Class is trying to achieve?

Contributor Author

tstescoTT Dec 18, 2024

BatchProcessor runs multiple concurrent requests to the backend inference server (vLLM in this case). This adds some functionality for sending requests with a specific max number of requests allowed that is independent with the backend batch_size. Mostly this is for testing continous batching and seq lens, but can be used as an alternative method for benchmarking as in benchmarking/prompt_client_online_benchmark.py measuring TTFT as experienced by users by not exceeding the backend concurrent user capacity and having requests queued on the backend server before processing starts by the model.

Contributor

milank94 Dec 23, 2024

A suggestion is to add that description to the file / Class. That's a good explanation for the user.

tstescoTT added 22 commits

December 19, 2024 22:25


          add print_prompts cli arg

7d68656


          remove redundant stop token from vLLM example api calls

8d78d64


          add capture_trace.py util to pre-prompt vllm server to capture all tr…

3108bc0

…ace input sizes


          adding utils/startup_utils.py to refine handling of startup in automa…

ea3d75d

…tion


          adding force_max_tokens as option to call_inference_api(), add input_…

cc1d17a

…seq_lengths and output_seq_lengths directly args to test_api_call_threaded_full_queue() to allow for varied isl and osl within batch


          faster mock model prefill

059d513


          make it not send stop tokens by default and speed up mock model decod…

48d17de

…e and prefill


          adding token count verification for vllm open ai api server to prompt…

fead1aa

…_client_cli.py


          add max-log-len to limit logging of prompts to avoid clutter in logs

5a80551


          add InferenceServerContext to startup_utils.py, improve wait_for_healthy

d845f08


          add all_responses to utils/prompt_client_cli.py not using globals

632ac83


          adding new utils/prompt_client_cli.py using utils/prompt_client.py an…

f563e32

…d utils/batch_processor.py with configs in utils/prompt_configs.py and utils/prompt_generation.py for prompt generation


          fix health endpoint

2467c74


          add vllm_model to EnvironmentConfig instead of BatchConfig

af5e8dc


          refactor utils/capture_traces.py with new prompt_client

60c7ab2


          fix utils imports

10993a2


          fix BatchConfig usage

20ccdf4


          add benchmarking/online_benchmark_prompt_client.py using prompt_clien…

eab7e76

…t.py


          add benchmarking/online_benchmark_prompt_client.py using prompt_clien…

90acdf6

…t.py


          add benchmarking, evals, and tests dirs to Dockerfile

ec486ad


          update patchfile and benchmarking README.md with commands

c58d7b3


          update Docker IMAGE_VERSION to v0.0.3

fe4f96d

tstescoTT added 18 commits

December 19, 2024 22:25


          improve doc

f3d815a


          update benchmark_serving.patch

8246a72


          add tt_model_runner.py patch for best_of

765c4be


          update benchmarking/benchmark_serving.patch

b93370d


          use CACHE_ROOT for vllm_online_benchmark_results dir

5e07baa


          adding timestamped online benchmark run result directory, rps=1 for v…

d0e0b0f

…llm online benchmark script


          update benchmark output file naming convention

5db2523


          rename benchmarking/online_benchmark_prompt_client.py to benchmarking…

5ab742c

…/prompt_client_online_benchmark.py


          increase num_prompts default, default to 128/128 online test

06420bd


          use min_tokens and ignore_eos=True to force output seq len

b7e4cfc


          adding min_tokens to locust requests

dda29a9


          add --ignore-eos to vllm_online_benchmark.py to force the output seq …

f8b3033

…len to be as configured


          add context_lens (isl, osl) pairs to capture_traces() to capture corr…

12c38fc

…ect traces for performance testing


          add trace pre-capture to prompt_client_cli.py with option to disable

1cabdc9


          better comment and logs for trace capture

68f08d0


          use TPOT and TPS in benchmarking/prompt_client_online_benchmark.py, a…

962c507

…dd support in client for ITL and TPOT


          update utils/prompt_client_cli.py and docs

62bf427


          remove WIP utils/startup_utils.py from this branch

d9e163c

tstescoTT force-pushed the tstesco/online-benchmark branch from af1325f to d9e163c Compare

December 20, 2024 03:25

milank94 approved these changes

View reviewed changes

Contributor

milank94 left a comment

Looks great. Pending one suggestion to add a description under BatchProcessor.

tstescoTT added 2 commits

December 31, 2024 19:59


          adding doc string to BatchProcessor

cd29085


          add output_path arg to batch_processor.py::BatchProcessor to optional…

376403d

…ly provide incremental output saveing for debugging, default to not saving output for benchmarking

tstescoTT merged commit fe563af into main

1 check passed

tstescoTT deleted the tstesco/online-benchmark branch

January 15, 2025 01:13

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet