Llama-3.1-8B inference time with different batch sizes

I have run Llama-3.1-8B inference on a A100 device with fixed 1 hour run. Comparing batch size 1 and 4 shows:

b1:
INFO:Llama-8B-SUT:Samples run: 2004
INFO:Llama-8B-SUT:	BatchMaker time: 1.430511474609375e-06
INFO:Llama-8B-SUT:	Inference time: 1.5663466453552246
INFO:Llama-8B-SUT:	Postprocess time: 3.147125244140625e-05
INFO:Llama-8B-SUT:	==== Total time: 1.5663795471191406

b4:
INFO:Llama-8B-SUT:Samples run: 7348
INFO:Llama-8B-SUT:	BatchMaker time: 2.384185791015625e-06
INFO:Llama-8B-SUT:	Inference time: 1.713407278060913
INFO:Llama-8B-SUT:	Postprocess time: 9.489059448242188e-05
INFO:Llama-8B-SUT:	==== Total time: 1.7135045528411865


So, given a fixed time, b4 runs more samples which makes sense. However, I wonder why inference time of b4 is larger than b1. I expected to see less inference time. Any idea about that? Or I am misunderstanding something?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Llama-3.1-8B inference time with different batch sizes #2386

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Llama-3.1-8B inference time with different batch sizes #2386

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions