-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adding online benchmarking scripts #55
Conversation
b66bc5a
to
af1325f
Compare
utils/batch_processor.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you provide a description of what this Class is trying to achieve?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BatchProcessor runs multiple concurrent requests to the backend inference server (vLLM in this case). This adds some functionality for sending requests with a specific max number of requests allowed that is independent with the backend batch_size. Mostly this is for testing continous batching and seq lens, but can be used as an alternative method for benchmarking as in benchmarking/prompt_client_online_benchmark.py
measuring TTFT as experienced by users by not exceeding the backend concurrent user capacity and having requests queued on the backend server before processing starts by the model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A suggestion is to add that description to the file / Class. That's a good explanation for the user.
…seq_lengths and output_seq_lengths directly args to test_api_call_threaded_full_queue() to allow for varied isl and osl within batch
…d utils/batch_processor.py with configs in utils/prompt_configs.py and utils/prompt_generation.py for prompt generation
…llm online benchmark script
…/prompt_client_online_benchmark.py
…len to be as configured
…ect traces for performance testing
…dd support in client for ITL and TPOT
af1325f
to
d9e163c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. Pending one suggestion to add a description under BatchProcessor
.
…ly provide incremental output saveing for debugging, default to not saving output for benchmarking
change log
benchmarking
,evals
, andtests
to Docker image vllm-tt-metal-llama3-70b/vllm.llama3.src.Dockerfile