-
Thanks for your help! |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 10 replies
-
Hello, good question!
So batch size is at the application level, while ubatch size is at the device level. batch_size >= ubatch_size. You can find some references here: |
Beta Was this translation helpful? Give feedback.
-
@phymbert |
Beta Was this translation helpful? Give feedback.
-
Execuse me, I'm using 4x Tesla T4 GPUs for computation and conducted experiments testing various KV cache type settings. I found that the --batch-size setting has almost no impact on first token time or inference time. My question is: In what use scenarios should I adjust batch-size or ubatch-size? Test Setup: Batch-size changes: negligible impact I would appreciate guidance on the specific scenarios where these parameters provide meaningful performance improvements. |
Beta Was this translation helpful? Give feedback.
The default values are here:
https://github.com/ggerganov/llama.cpp/blob/557410b8f06380560155ac7fcb8316d71ddc9837/common/common.h#L57