Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update server.cpp example with correct startup sequence #6739

Draft
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

mann1x
Copy link

@mann1x mann1x commented Apr 18, 2024

The HTTP listener start and the health API endpoint are moved before the model loading starts, hence the server can correctly report is loading the model

The HTTP listener start and the health API endpoint are moved before the model loading starts, hence the server can correctly report is loading the model
Copy link
Collaborator

@ngxson ngxson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally, all API endpoints must be registered before server start listening (svr->listen_after_bind() in this case), so I'm not convinced by this change, which registers /health before all other endpoints and the rest being registered after model loaded. During the model load, all other endpoints will throw 404 not found error which is not correct (it should be 503 Service Unavailable)

Furthermore, this change requires main thread to call svr to register new endpoints after it is spawned into new thread. This will make svr not thread-safe.

In any cases, it will be more logical to call ctx_server.load_model(params) only after all endpoints are registered. Additionally, we can add a middleware to throw 503 if the model is not yet loaded.

@mann1x
Copy link
Author

mann1x commented Apr 18, 2024

In any cases, it will be more logical to call ctx_server.load_model(params) only after all endpoints are registered. Additionally, we can add a middleware to throw 503 if the model is not yet loaded.

Binding them before doesn't work, the model must be loaded.
They can be binded afterwards, no issues.
There's really no reason to use the other endpoints until the server reports that the model is still being loaded.
But indeed I haven't thought about 404 not being the right answer.
Made this for ollama which doesn't use any other endpoint.

I will amend it registering the other endpoints with a static 503 answer before listening and re-registering them later once the model is loaded.

@mann1x
Copy link
Author

mann1x commented Apr 18, 2024

Furthermore, this change requires main thread to call svr to register new endpoints after it is spawned into new thread. This will make svr not thread-safe.

You are right I didn't check this.
Will try to make it work without re-registering the endpoints at all.

This comment was marked as off-topic.

@ngxson
Copy link
Collaborator

ngxson commented Apr 18, 2024

Binding them before doesn't work, the model must be loaded.

The reference to ctx_server is never changed before/after model is loaded. If you see errors, that maybe because something accesses to ctx_server before model is loaded (for example, checking chat template). You should move them all to below code block where all endpoints are registered and HTTP is listening.

Another thing to add is inside svr->set_pre_routing_handler, there should be a middleware to check if we're accessing endpoints other than /health. If model is not loaded, the middleware must return 503 error.

mann1x added 2 commits April 18, 2024 18:11
- Moved endpoints registration before HTTP listener starts
- Endpoints are returning the correct error when the model is loading or failed to load
- Server is exiting if failed to bind the port
@mann1x
Copy link
Author

mann1x commented Apr 18, 2024

@ngxson

Can you please check it again and let me know?

The empty_json_model() is just a leftover of course.

What I'm wondering is if the middleware should report error 500 also for the static pages.

Thanks!

@phymbert
Copy link
Collaborator

What I'm wondering is if the middleware should report error 500 also for the static pages

If we can return the static pages while the model is loading, this is fine.

@mann1x
Copy link
Author

mann1x commented Apr 19, 2024

@ngxson
Seems to work, let me know. Thanks a lot!

@mofosyne mofosyne added Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level examples labels May 9, 2024
@mofosyne mofosyne marked this pull request as draft June 9, 2024 05:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
examples Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level server/webui
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants