Update server.cpp example with correct startup sequence #6739

mann1x · 2024-04-18T10:09:17Z

The HTTP listener start and the health API endpoint are moved before the model loading starts, hence the server can correctly report is loading the model

ngxson

Normally, all API endpoints must be registered before server start listening (svr->listen_after_bind() in this case), so I'm not convinced by this change, which registers /health before all other endpoints and the rest being registered after model loaded. During the model load, all other endpoints will throw 404 not found error which is not correct (it should be 503 Service Unavailable)

Furthermore, this change requires main thread to call svr to register new endpoints after it is spawned into new thread. This will make svr not thread-safe.

In any cases, it will be more logical to call ctx_server.load_model(params) only after all endpoints are registered. Additionally, we can add a middleware to throw 503 if the model is not yet loaded.

mann1x · 2024-04-18T10:28:05Z

In any cases, it will be more logical to call ctx_server.load_model(params) only after all endpoints are registered. Additionally, we can add a middleware to throw 503 if the model is not yet loaded.

Binding them before doesn't work, the model must be loaded.
They can be binded afterwards, no issues.
There's really no reason to use the other endpoints until the server reports that the model is still being loaded.
But indeed I haven't thought about 404 not being the right answer.
Made this for ollama which doesn't use any other endpoint.

I will amend it registering the other endpoints with a static 503 answer before listening and re-registering them later once the model is loaded.

mann1x · 2024-04-18T10:33:15Z

Furthermore, this change requires main thread to call svr to register new endpoints after it is spawned into new thread. This will make svr not thread-safe.

You are right I didn't check this.
Will try to make it work without re-registering the endpoints at all.

ngxson · 2024-04-18T10:51:15Z

Binding them before doesn't work, the model must be loaded.

The reference to ctx_server is never changed before/after model is loaded. If you see errors, that maybe because something accesses to ctx_server before model is loaded (for example, checking chat template). You should move them all to below code block where all endpoints are registered and HTTP is listening.

Another thing to add is inside svr->set_pre_routing_handler, there should be a middleware to check if we're accessing endpoints other than /health. If model is not loaded, the middleware must return 503 error.

- Moved endpoints registration before HTTP listener starts - Endpoints are returning the correct error when the model is loading or failed to load - Server is exiting if failed to bind the port

mann1x · 2024-04-18T16:16:29Z

@ngxson

Can you please check it again and let me know?

The empty_json_model() is just a leftover of course.

What I'm wondering is if the middleware should report error 500 also for the static pages.

Thanks!

phymbert · 2024-04-18T18:41:21Z

What I'm wondering is if the middleware should report error 500 also for the static pages

If we can return the static pages while the model is loading, this is fine.

examples/server/server.cpp

https://github.com/ggerganov/llama.cpp/pull/6739/files/b9613ef11a748ba3b8961fd0504aca89134d705e

mann1x · 2024-04-19T17:05:10Z

@ngxson
Seems to work, let me know. Thanks a lot!

examples/server/server.cpp

Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>

Update server.cpp example with correct startup sequence

d9157cd

The HTTP listener start and the health API endpoint are moved before the model loading starts, hence the server can correctly report is loading the model

phymbert requested a review from ngxson April 18, 2024 10:13

phymbert added the server/webui label Apr 18, 2024

ngxson requested changes Apr 18, 2024

View reviewed changes

This comment was marked as off-topic.

Sign in to view

mann1x added 2 commits April 18, 2024 18:11

Merge branch 'ggerganov:master' into mannix-server-startup

4de4670

Moved endpoints registration before listener and fixes

52a4d59

- Moved endpoints registration before HTTP listener starts - Endpoints are returning the correct error when the model is loading or failed to load - Server is exiting if failed to bind the port

Removed leftover

b9613ef

ngxson requested changes Apr 19, 2024

View reviewed changes

Update server.cpp after code review

61b483d

https://github.com/ggerganov/llama.cpp/pull/6739/files/b9613ef11a748ba3b8961fd0504aca89134d705e

ngxson requested changes Apr 19, 2024

View reviewed changes

examples/server/server.cpp Show resolved Hide resolved

mann1x and others added 2 commits April 20, 2024 09:05

Fixes unhandled status ready with default: switch

942f023

Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>

Merge branch 'ggerganov:master' into mannix-server-startup

ca0409f

mofosyne added Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level examples labels May 9, 2024

mofosyne marked this pull request as draft June 9, 2024 05:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update server.cpp example with correct startup sequence #6739

Update server.cpp example with correct startup sequence #6739

mann1x commented Apr 18, 2024

ngxson left a comment •

edited

Loading

mann1x commented Apr 18, 2024

mann1x commented Apr 18, 2024

This comment was marked as off-topic.

ngxson commented Apr 18, 2024 •

edited

Loading

mann1x commented Apr 18, 2024 •

edited

Loading

phymbert commented Apr 18, 2024

mann1x commented Apr 19, 2024

Update server.cpp example with correct startup sequence #6739

Are you sure you want to change the base?

Update server.cpp example with correct startup sequence #6739

Conversation

mann1x commented Apr 18, 2024

ngxson left a comment • edited Loading

Choose a reason for hiding this comment

mann1x commented Apr 18, 2024

mann1x commented Apr 18, 2024

This comment was marked as off-topic.

ngxson commented Apr 18, 2024 • edited Loading

mann1x commented Apr 18, 2024 • edited Loading

phymbert commented Apr 18, 2024

mann1x commented Apr 19, 2024

ngxson left a comment •

edited

Loading

ngxson commented Apr 18, 2024 •

edited

Loading

mann1x commented Apr 18, 2024 •

edited

Loading