Skip to content

[Rate Type] Concurrencies #47

Closed
Closed
@philschmid

Description

@philschmid

Hello,

I am trying to integrate guidellm into a benchmark suite. And there we ran different load tests based on use concurrencies. We define user concurrenies as "users" that send requests after each other. Meaning send request -> wait for response -> send next request.

I first assumed that's what is done with "constant" and "rate" but there is send way more requests as they are send per second. Is there a way to customize the "user concurrency"? I assume that concurrency == synchronous type. But would be create if i could do something like

guidellm --target "http://localhost:8080/v1" --model "meta-llama/Meta-Llama-3.1-8B-Instruct"  --data-type emulated --data "prompt_tokens=550,generated_tokens=250" --max-seconds 60 --rate-type concurrent --rate 1 --rate 2 --rate 10 --rate 50 --output-path r.json

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions