Skip to content

Commit d99d952

Browse files
llama-router: replace implicit arg injection with explicit placeholders
Remove automatic --model/--port/--host appending in favor of $path, $port, $host placeholders in spawn commands. All parameters now visible in configuration for full transparency and flexibility
1 parent 1a014b2 commit d99d952

File tree

4 files changed

+55
-15
lines changed

4 files changed

+55
-15
lines changed

tools/router/ARCHITECTURE.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -148,6 +148,13 @@ Spawn commands support both absolute/relative paths and PATH-based binaries:
148148

149149
The router only validates file existence for commands containing `/` or `\\` path separators, allowing seamless use of system-installed binaries.
150150

151+
### Spawn Command Placeholders
152+
153+
The router expands placeholders in spawn commands:
154+
- `$path` → The model file path from `path` field
155+
- `$port` → Dynamically assigned port (increments from `base_port`)
156+
- `$host` → Always expands to `127.0.0.1` for security
157+
151158
### Model-Scoped Route Stripping
152159

153160
Routes like `/<model>/health` are router-side aliases for convenience. Before proxying to the backend, the router strips the model prefix:

tools/router/README.md

Lines changed: 35 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -196,7 +196,15 @@ Override with `--config`:
196196
"notify_model_swap": false
197197
},
198198
"default_spawn": {
199-
"command": ["llama-server", "--jinja", "--ctx-size", "4096", "--n-gpu-layers", "99"],
199+
"command": [
200+
"llama-server",
201+
"-m", "$path",
202+
"--port", "$port",
203+
"--host", "$host",
204+
"--jinja",
205+
"--ctx-size", "4096",
206+
"--n-gpu-layers", "99"
207+
],
200208
"proxy_endpoints": ["/v1/", "/health", "/slots", "/props"],
201209
"health_endpoint": "/health"
202210
},
@@ -233,16 +241,31 @@ The `default_spawn` block defines how llama-server instances are launched:
233241

234242
```json
235243
{
236-
"command": ["llama-server", "--jinja", "--ctx-size", "4096", "--n-gpu-layers", "99"],
244+
"command": [
245+
"llama-server",
246+
"-m", "$path",
247+
"--port", "$port",
248+
"--host", "$host",
249+
"--jinja",
250+
"--ctx-size", "4096",
251+
"--n-gpu-layers", "99"
252+
],
237253
"proxy_endpoints": ["/v1/", "/health", "/slots", "/props"],
238254
"health_endpoint": "/health"
239255
}
240256
```
241257

242-
The router automatically appends these arguments:
243-
- `--model <path>` - The model file path
244-
- `--port <port>` - Dynamically assigned port
245-
- `--host 127.0.0.1` - Localhost binding for security
258+
### Spawn Command Placeholders
259+
260+
The router supports placeholders in spawn commands for dynamic value injection:
261+
262+
| Placeholder | Description | Example expansion |
263+
|-------------|-------------|-------------------|
264+
| `$path` | Model file path from configuration | `/home/user/.cache/llama.cpp/model.gguf` |
265+
| `$port` | Dynamically assigned port | `50000`, `50001`, etc. |
266+
| `$host` | Bind address (always `127.0.0.1`) | `127.0.0.1` |
267+
268+
This makes all spawn parameters explicit and visible in the configuration.
246269

247270
### Optimizing for Your Hardware
248271

@@ -253,6 +276,9 @@ The `default_spawn` is where you tune performance for your specific hardware. **
253276
"default_spawn": {
254277
"command": [
255278
"llama-server",
279+
"-m", "$path",
280+
"--port", "$port",
281+
"--host", "$host",
256282
"-ngl", "999",
257283
"-ctk", "q8_0",
258284
"-ctv", "q8_0",
@@ -277,8 +303,6 @@ The `default_spawn` is where you tune performance for your specific hardware. **
277303
- `-kvu`: Use single unified KV buffer for all sequences (also `--kv-unified`)
278304
- `--jinja`: Enable Jinja template support
279305

280-
**Note:** The router automatically appends `--model`, `--port`, and `--host` - do not include these in your command.
281-
282306
Change `default_spawn`, reload the router, and all `auto` models instantly use the new configuration.
283307

284308
### Per-Model Spawn Override
@@ -293,6 +317,9 @@ Individual models can override the default spawn configuration:
293317
"spawn": {
294318
"command": [
295319
"llama-server",
320+
"-m", "$path",
321+
"--port", "$port",
322+
"--host", "$host",
296323
"--jinja",
297324
"--ctx-size", "8192",
298325
"--n-gpu-layers", "99",

tools/router/router-app.cpp

Lines changed: 12 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -82,12 +82,18 @@ bool RouterApp::ensure_running(const std::string & model_name, std::string & err
8282
const SpawnConfig spawn_cfg = resolve_spawn_config(cfg);
8383

8484
std::vector<std::string> command = spawn_cfg.command;
85-
command.push_back("--model");
86-
command.push_back(expand_user_path(cfg.path));
87-
command.push_back("--port");
88-
command.push_back(std::to_string(port));
89-
command.push_back("--host");
90-
command.push_back("127.0.0.1");
85+
const std::string model_path = expand_user_path(cfg.path);
86+
87+
// Replace all placeholders
88+
for (auto & arg : command) {
89+
if (arg == "$path") {
90+
arg = model_path;
91+
} else if (arg == "$port") {
92+
arg = std::to_string(port);
93+
} else if (arg == "$host") {
94+
arg = "127.0.0.1";
95+
}
96+
}
9197

9298
LOG_INF("Starting %s on port %d\n", model_name.c_str(), port);
9399

tools/router/router-config.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -107,7 +107,7 @@ static json serialize_spawn_config(const SpawnConfig & spawn) {
107107
const SpawnConfig & get_default_spawn() {
108108
static const SpawnConfig spawn = [] {
109109
SpawnConfig default_spawn = {
110-
/*command =*/ {"llama-server", "--jinja", "--ctx-size", "4096", "--n-gpu-layers", "99"},
110+
/*command =*/ {"llama-server", "-m", "$path", "--port", "$port", "--host", "$host", "--jinja", "--ctx-size", "4096", "--n-gpu-layers", "99"},
111111
/*proxy_endpoints =*/ {"/v1/", "/health", "/slots", "/props"},
112112
/*health_endpoint =*/ "/health",
113113
};

0 commit comments

Comments
 (0)