Closed
Description
The number of concurrent requests in throughput mode is always one less then Settings.max_concurrency
.
Example
export GUIDELLM__MAX_CONCURRENCY="2"
guidellm --target http://localhost:8000/v1 \
--model meta-llama/Llama-3.2-3B \
--data-type emulated \
--data prompt_tokens=512,generated_tokens=2048 \
--rate-type throughput \
--max-seconds 300
Observe from server side that number of requests in queue is 1
Metadata
Metadata
Assignees
Labels
No labels