Skip to content

GUIDELLM__MAX_CONCURRENCY is off by 1 #70

Closed
@sjmonson

Description

@sjmonson

The number of concurrent requests in throughput mode is always one less then Settings.max_concurrency.

Example

export GUIDELLM__MAX_CONCURRENCY="2"

guidellm --target http://localhost:8000/v1 \
         --model meta-llama/Llama-3.2-3B \
         --data-type emulated \
         --data prompt_tokens=512,generated_tokens=2048 \
         --rate-type throughput \
         --max-seconds 300

Observe from server side that number of requests in queue is 1

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions