-
Is there a way to benchmark models at long context sizes? I am trying to use llama-bench, but I am not able to figure out if it's even possible. I have tried:
What I want to do, is compare how fast my model is able to generate say 128 tokens with a short 128 token prompt, versus how long it takes to generate those same 128 tokens, with 15k tokens in the context. But I am not able to figure out how to do that. Does anyone know if this is even possible? Thanks! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
You can get these numbers with the |
Beta Was this translation helpful? Give feedback.
You can get these numbers with the
llama-batched-bench
tool, although it does not compute uncertainties like thellama-bench
tool.