Batched Inference to Improve GPU Utilisation

**Is your feature request related to a problem? Please describe.**
When using this library in a loop, I am getting poor GPU Utilisation running zephyr-7b.

**Describe the solution you'd like**
It would be fantastic to be able to pass a list of prompts to a function of the Transformers class, and define a batch size like you can for a huggingface pipeline. This significantly improves speed and GPU utilisation.

**Additional context**
GPU utilisation for reference:
![image](https://github.com/guidance-ai/guidance/assets/51102701/7fd6de7a-d2dc-4e2b-ae48-585c10fa04ee)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Batched Inference to Improve GPU Utilisation #493

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Batched Inference to Improve GPU Utilisation #493

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions