-
Notifications
You must be signed in to change notification settings - Fork 605
Description
I have run Llama-3.1-8B inference on a A100 device with fixed 1 hour run. Comparing batch size 1 and 4 shows:
b1:
INFO:Llama-8B-SUT:Samples run: 2004
INFO:Llama-8B-SUT: BatchMaker time: 1.430511474609375e-06
INFO:Llama-8B-SUT: Inference time: 1.5663466453552246
INFO:Llama-8B-SUT: Postprocess time: 3.147125244140625e-05
INFO:Llama-8B-SUT: ==== Total time: 1.5663795471191406
b4:
INFO:Llama-8B-SUT:Samples run: 7348
INFO:Llama-8B-SUT: BatchMaker time: 2.384185791015625e-06
INFO:Llama-8B-SUT: Inference time: 1.713407278060913
INFO:Llama-8B-SUT: Postprocess time: 9.489059448242188e-05
INFO:Llama-8B-SUT: ==== Total time: 1.7135045528411865
So, given a fixed time, b4 runs more samples which makes sense. However, I wonder why inference time of b4 is larger than b1. I expected to see less inference time. Any idea about that? Or I am misunderstanding something?