Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server bench: fix bench not waiting for model load #7284

Merged

Conversation

JohannesGaessler
Copy link
Collaborator

While working on #6828 I noticed that when using a large static n-ngam cache the benchmark would report 0 iterations for the first 8 minutes and then 30 iterations for the last 2 minutes. What seems to be happening is that bench.py doesn't correctly wait for the server to be ready so the clock starts ticking even while the n-gram cache is still being loaded. From what I can tell loading the model from disk can have the same issue if it's e.g. on an HDD.

This PR makes it so that bench.py waits for response 200 (SERVER_STATE_READY) from the health endpoint for checking whether the server is actually ready. I'm not sure if there is a better way to implement this than what I did; I'm definitely open to suggestions.

@mofosyne mofosyne added examples Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix python python script changes labels May 14, 2024
@JohannesGaessler JohannesGaessler merged commit 583fd6b into ggerganov:master May 15, 2024
24 checks passed
@ggerganov
Copy link
Owner

It looks like this change causes the server Benchmark that we run on the self-hosted runner to fail like this:

https://github.com/ggerganov/llama.cpp/actions/runs/9094073377/job/24998422481

I tried to revert it and now the benchmark passes:

https://github.com/ggerganov/llama.cpp/actions/runs/9112533114

I'm not sure why it is causing the error - any ideas how to fix?

@phymbert
Copy link
Collaborator

Yes, the problem is here:

if is_server_listening("0.0.0.0", 9090):

It considers prometheus not started, which is not working as expected. Probably easier to revert and separate in another PR prometheus check vs llama.cpp server checks ?

phymbert added a commit that referenced this pull request May 16, 2024
phymbert added a commit that referenced this pull request May 16, 2024
teleprint-me pushed a commit to teleprint-me/llama.cpp that referenced this pull request May 17, 2024
teleprint-me pushed a commit to teleprint-me/llama.cpp that referenced this pull request May 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
examples python python script changes Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants