-
Notifications
You must be signed in to change notification settings - Fork 155
Closed
Description
What happened?
As mentioned in #273 I've seen this behavior occur with llama-server (sorry, I never really noted the configurations or models it occurs with), and I can usually mitigate it by canceling and then restarting generation until TG performance goes back to the expected value, the chart below shows this behavior captured in a benchmark.
Also I'm fairly certain I've never encountered this bug in batched-bench only in server and sweep-bench both of which manipulate the KV more than batched-bench.
Name and Version
Graph capturing this behavior was on 3d6e25c
What operating system are you seeing the problem on?
Linux
Relevant log output
orca-zhangorca-zhang
Metadata
Metadata
Assignees
Labels
No labels
