Server: reorganize some http logic #5939

ngxson · 2024-03-08T11:15:04Z

Motivation

From the beginning, server routes are registered inline, for example:

svr.Post("/completions", [&ctx_server](const httplib::Request & req, httplib::Response & res) {
   ...
});

This is OK when we have small number of routes. But as the code of server grows in complexity, we started having more routes, even multiple routes that shared the same handler (like /chat/completions and /v1/chat/completions).

I propose in the PR to separate the part where we define the handler function and the route, which is inspired by modern web backend framework that have controllers in one place and routes in another place.

This helps us to better in keeping track of number of routes in the future.

Here is a list of routes that we currently support:

    svr.Get("/", handle_static_file(index_html, index_html_len, "text/html; charset=utf-8"));
    svr.Get("/index.js", handle_static_file(index_js, index_js_len, "text/javascript; charset=utf-8"));
    svr.Get("/completion.js", handle_static_file(completion_js, completion_js_len, "text/javascript; charset=utf-8"));
    svr.Get("/json-schema-to-grammar.mjs", handle_static_file(
        json_schema_to_grammar_mjs, json_schema_to_grammar_mjs_len, "text/javascript; charset=utf-8"));

    // register API routes
    svr.Get ("/health",              handle_health);
    svr.Get ("/slots",               handle_slots);
    svr.Get ("/metrics",             handle_metrics);
    svr.Get ("/props",               handle_props);
    svr.Get ("/v1/models",           handle_models);
    svr.Post("/completion",          handle_completions); // legacy
    svr.Post("/completions",         handle_completions);
    svr.Post("/v1/completions",      handle_completions);
    svr.Post("/chat/completions",    handle_chat_completions);
    svr.Post("/v1/chat/completions", handle_chat_completions);
    svr.Post("/infill",              handle_infill);
    svr.Post("/embedding",           handle_embeddings); // legacy
    svr.Post("/embeddings",          handle_embeddings);
    svr.Post("/v1/embeddings",       handle_embeddings);
    svr.Post("/tokenize",            handle_tokenize);
    svr.Post("/detokenize",          handle_detokenize);

In this PR

Disable static assets path by default. We already have all the JS/HTML files embedded in the compiled binary. I feel like leaving static path enabled by default is maybe a security hazard ==> it's now disabled by default
Move validate_api_key to a real middleware (using set_pre_routing_handler)
Break the HTTP code into dedicated sections: Middlewares - Handlers - Routes
Merge embedding and /v1/embeddings handler
~~De-duplicate some codes in chunked_content_provider~~ ==> do it later, to avoid conflicts
~~Move json::parse to a middleware ==> don't know yet how to do, or will it worth the effort~~ ==> will see later in another PR

Also fix a bug when API key has less than 4 characters

ngxson · 2024-03-08T14:02:03Z

examples/server/server.cpp

+        // process all prompts
+        json responses = json::array();
+        for (auto & prompt : prompts) {
+            // TODO @ngxson : maybe support multitask for this endpoint?


@ggerganov Quick question: Should we use multi task for multiple input prompts in embedding endpoint?

Yes, probably a good idea in order to occupy less slots during processing

FSSRepo · 2024-03-08T16:37:39Z

I didn't really like the change in the server logs; they seem too verbose now that they're in JSON format. There should be an option to display them in a more understandable way when debugging in the terminal, even finding the entry endpoint is frustrating.

{"function":"initialize","level":"INFO","line":426,"msg":"initializing slots","n_slots":1,"tid":"5612","timestamp":1709914159}
{"function":"initialize","level":"INFO","line":438,"msg":"new slot","n_ctx_slot":512,"slot_id":0,"tid":"5612","timestamp":1709914159}
{"function":"main","level":"INFO","line":3040,"msg":"model loaded","tid":"5612","timestamp":1709914159}
{"function":"main","hostname":"127.0.0.1","level":"INFO","line":3509,"msg":"HTTP server listening","n_threads_http":"11","port":"8080","tid":"5612","timestamp":1709914159}
{"function":"update_slots","level":"INFO","line":1574,"msg":"all slots are idle and system prompt is empty, clear the KV cache","tid":"5612","timestamp":1709914159}
{"function":"log_server_request","level":"INFO","line":2737,"method":"GET","msg":"request","params":{},"path":"/","remote_addr":"127.0.0.1","remote_port":52626,"status":200,"tid":"12236","timestamp":1709914168}
{"function":"log_server_request","level":"INFO","line":2737,"method":"GET","msg":"request","params":{},"path":"/index.js","remote_addr":"127.0.0.1","remote_port":52626,"status":200,"tid":"12236","timestamp":1709914168}
{"function":"log_server_request","level":"INFO","line":2737,"method":"GET","msg":"request","params":{},"path":"/completion.js","remote_addr":"127.0.0.1","remote_port":52627,"status":200,"tid":"22724","timestamp":1709914168}
{"function":"log_server_request","level":"INFO","line":2737,"method":"GET","msg":"request","params":{},"path":"/json-schema-to-grammar.mjs","remote_addr":"127.0.0.1","remote_port":52628,"status":200,"tid":"22540","timestamp":1709914168}
{"function":"launch_slot_with_data","level":"INFO","line":819,"msg":"slot is processing task","slot_id":0,"task_id":0,"tid":"5612","timestamp":1709914171}
{"function":"update_slots","ga_i":0,"level":"INFO","line":1812,"msg":"slot progression","n_past":0,"n_past_se":0,"n_prompt_tokens_processed":112,"slot_id":0,"task_id":0,"tid":"5612","timestamp":1709914171}
{"function":"update_slots","level":"INFO","line":1836,"msg":"kv cache rm [p0, end)","p0":0,"slot_id":0,"task_id":0,"tid":"5612","timestamp":1709914171}
{"function":"print_timings","level":"INFO","line":260,"msg":"prompt eval time     =   28783.38 ms /   112 tokens (  256.99 ms per token,     3.89 tokens per second)","n_prompt_tokens_processed":112,"n_tokens_second":3.891134127323884,"slot_id":0,"t_prompt_processing":28783.382,"t_token":256.9944821428572,"task_id":0,"tid":"5612","timestamp":1709914203}
{"function":"print_timings","level":"INFO","line":274,"msg":"generation eval time =    3490.47 ms /    12 runs   (  290.87 ms per token,     3.44 tokens per second)","n_decoded":12,"n_tokens_second":3.4379334123867027,"slot_id":0,"t_token":290.87241666666665,"t_token_generation":3490.469,"task_id":0,"tid":"5612","timestamp":1709914203}
{"function":"print_timings","level":"INFO","line":283,"msg":"          total time =   32273.85 ms","slot_id":0,"t_prompt_processing":28783.382,"t_token_generation":3490.469,"t_total":32273.851000000002,"task_id":0,"tid":"5612","timestamp":1709914203}
{"function":"update_slots","level":"INFO","line":1644,"msg":"slot released","n_cache_tokens":124,"n_ctx":512,"n_past":123,"n_system_tokens":0,"slot_id":0,"task_id":0,"tid":"5612","timestamp":1709914203,"truncated":false}
{"function":"log_server_request","level":"INFO","line":2737,"method":"POST","msg":"request","params":{},"path":"/completion","remote_addr":"127.0.0.1","remote_port":52628,"status":200,"tid":"22540","timestamp":1709914203}

ngxson · 2024-03-08T16:40:52Z

@FSSRepo There's log format option: https://github.com/ggerganov/llama.cpp/blob/master/examples/server/README.md?plain=1#L60

phymbert

Nice! The code becomes better and better to work with

phymbert · 2024-03-08T19:24:04Z

@ngxson mind that the test security.feature looks broken

ngxson · 2024-03-08T21:39:25Z

@phymbert Thanks for noticing. I believe I accidentally found a bug here: The server in the test case is launched with an API key, but the scenario doesn't use that API key. I needed to modify both cpp code + test script to make it work. Could you please have a look? Thanks! 1a7c5fd

phymbert · 2024-03-08T21:43:50Z

@phymbert Thanks for noticing. I believe I accidentally found a bug here: The server in the test case is launched with an API key, but the scenario doesn't use that API key. I needed to modify both cpp code + test script to make it work. Could you please have a look? Thanks! 1a7c5fd

no worries, the test feature is called security, but it is quite ambitious at this stage.

ggerganov · 2024-03-09T09:22:32Z

examples/server/server.cpp

+        // process all prompts
+        json responses = json::array();
+        for (auto & prompt : prompts) {
+            // TODO @ngxson : maybe support multitask for this endpoint?


Yes, probably a good idea in order to occupy less slots during processing

ggerganov · 2024-03-09T10:24:07Z

@ngxson Maybe you should merge it as it is to avoid resolving conflicts and then implement the embeddings multitask in a separate PR

ngxson · 2024-03-09T10:27:47Z

@ggerganov OK I'll merge it now without waiting for the CI.

I also ran the test in local and it pass, so we can be confident that the CI will pass.

* refactor static file handler * use set_pre_routing_handler for validate_api_key * merge embedding handlers * correct http verb for endpoints * fix embedding response * fix test case CORS Options * fix code style

refactor static file handler

d8a8bd4

ngxson marked this pull request as draft March 8, 2024 11:31

ngxson added 3 commits March 8, 2024 14:04

use set_pre_routing_handler for validate_api_key

07f120e

merge embedding handlers

1866e18

Merge branch 'master' into xsn/cleanup_oai

a1351ef

ngxson marked this pull request as ready for review March 8, 2024 14:00

ngxson requested review from ggerganov and phymbert March 8, 2024 14:00

ngxson commented Mar 8, 2024

View reviewed changes

ngxson added 2 commits March 8, 2024 15:11

correct http verb for endpoints

bd52541

fix embedding response

7445bb8

ngxson changed the title ~~Server: de-duplicate some http logic~~ Server: reorganize some http logic Mar 8, 2024

phymbert approved these changes Mar 8, 2024

View reviewed changes

fix test case CORS Options

1a7c5fd

fix code style

09d3b65

ggerganov approved these changes Mar 9, 2024

View reviewed changes

Merge branch 'master' into xsn/cleanup_oai

51f80eb

ngxson merged commit 950ba1a into ggerganov:master Mar 9, 2024
54 of 59 checks passed

ngxson mentioned this pull request Mar 11, 2024

Server: Use multi-task for embeddings endpoint #6001

Merged

ngxson mentioned this pull request Mar 12, 2024

server : improvements and maintenance #4216

Open

10 tasks

ngxson mentioned this pull request Mar 19, 2024

Server may have a directory traversal vulnerability fixed upstream #6162

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Server: reorganize some http logic #5939

Server: reorganize some http logic #5939

ngxson commented Mar 8, 2024 •

edited

Loading

ngxson Mar 8, 2024

ggerganov Mar 9, 2024

FSSRepo commented Mar 8, 2024

ngxson commented Mar 8, 2024

phymbert left a comment

phymbert commented Mar 8, 2024

ngxson commented Mar 8, 2024

phymbert commented Mar 8, 2024

ggerganov Mar 9, 2024

ggerganov commented Mar 9, 2024

ngxson commented Mar 9, 2024

Server: reorganize some http logic #5939

Server: reorganize some http logic #5939

Conversation

ngxson commented Mar 8, 2024 • edited Loading

Motivation

In this PR

ngxson Mar 8, 2024

Choose a reason for hiding this comment

ggerganov Mar 9, 2024

Choose a reason for hiding this comment

FSSRepo commented Mar 8, 2024

ngxson commented Mar 8, 2024

phymbert left a comment

Choose a reason for hiding this comment

phymbert commented Mar 8, 2024

ngxson commented Mar 8, 2024

phymbert commented Mar 8, 2024

ggerganov Mar 9, 2024

Choose a reason for hiding this comment

ggerganov commented Mar 9, 2024

ngxson commented Mar 9, 2024

ngxson commented Mar 8, 2024 •

edited

Loading