Description
Hey.
I am writing an article comparing and contrasting my desktop PG to my laptop.
It runs fine on the desktop and gets decent throughput.
Desktop
{
"hide it"
}
My laptop isn't as beefy,
Laptop
hide it
Running 'llm_benchmark run' in a Python virtual environment on my laptop is taking a very long time just to execute the first prompt against the mistral:7b model. It has been running for well over two hours.
The program did pull the 7 LLMs it required.
Looking at performance on my laptop I see the following from Task Manager
Windows Task Manager
CPU Utilisation : 80%
CPU Speed 4.64 Ghz
Memory in Use : 10.4 GB
Memory Available : 5.2 GB
Disk Space
Total Disk Space : 474 GB
Disk Space Available : 236 GB
Any pointers to make this run. I am convinced it cant be the ollama install as I can run "Write a step-by-step guide on how to bake a chocolate cake from scratch" against ollama running llama3:8b and it completes in a little under 3 minutes. (that's a rough guestimate from scrolling back in the logs)
Running the same prompt from 'ollama run mistral:7b' cli it completes even faster.
Why does it not complete from the llm_benchmark?
I have attached the server and app logs from my laptop to the issue
app.log
server.log