Enable Batch Inferencing Benchmarking Support

**Purpose:**
The purpose of this ticket is to support batch inference requests for GuideLLM. We will need to add in a batch-size parameter to enable the user to dictate the batch size of inference requests. 

We will want to add non-streaming batched inferencing like this PR from llm-load-test https://github.com/openshift-psap/llm-load-test/pull/66  

Acceptance Criteria:
- Add a `batch-size` parameter in for the user to dictate the batch size of inference requests 
- This should be non-streaming batching 
- Once set, send requests in batches as dictated by the parameter 
Example:
`--batch-size=16, --batch-size=32
`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enable Batch Inferencing Benchmarking Support #102

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Enable Batch Inferencing Benchmarking Support #102

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions