Implement new scheduler by mostlygeek · Pull Request #823 · mostlygeek/llama-swap

mostlygeek · 2026-06-07T05:21:28Z

introduce internal/router/scheduler to decouple routing, swapping and queuing into interface contracts.
introduce a new routing configuration section that supersedes matrix and group while maintaining backwards compatibility
add FIFO scheduler with prioritized queuing
add internal/router/design.md as developer documentation on implementing new schedulers and routers

Fixes #797

Extract the request-scheduling logic out of baseRouter into a new internal/router/scheduler package so the scheduling strategy can be swapped independently of the process machinery and the eviction policy. FIFO is the first and only implementation and reproduces the previous behaviour exactly. baseRouter keeps the channels, run loop, ServeHTTP, process lifecycle (doSwap), and shutdown teardown, and now implements scheduler.Effects so the scheduler can call back for every side-effect. newBaseRouter takes a scheduler.Factory (defaulting to FIFO, wired up by Group and Matrix) instead of a SwapPlanner. - move SwapPlanner into the scheduler package; groupPlanner and matrixPlanner satisfy it structurally and are captured in the factory closure each concrete router passes to newBaseRouter - move the run-loop state (active/inFlight/queued) and the decision tree (OnRequest/OnSwapDone/OnServeDone/OnUnload/OnShutdown) into scheduler.FIFO - alias router.ErrNoLocalModelFound to scheduler.ErrModelNotFound so SendError keeps matching the error the scheduler grants - migrate the FIFO scheduling tests to the scheduler package as direct, synchronous unit tests against a fake Effects; trim base_test.go down to baseRouter mechanism only (run loop, grant/ServeHTTP, Unload, Shutdown)

- Add the first draft of developer docs for creating new routers and schedulers. Refactor the scheduler interfaces to eliminate the extra map[string]*process.Process in the matrixPlanner and groupPlanner. This makes the scheduler the single owner of "what's running".

The new routing.router and routing.schedule is the new home for router and scheduler settings. Backwards compatibility of old configuration files are maintained with a new config error if both legacy matrix/group are defined along routing.router settings

Wire config.FifoConfig into the FIFO scheduler so queued requests are serviced by descending model priority instead of strict arrival order. - NewFIFO accepts a config.FifoConfig and stores it on the scheduler - enqueue does a stable priority insert: higher-priority models float ahead, equal-priority requests keep FIFO arrival order - group and matrix routers pass conf.Routing.Scheduler.Settings.Fifo - add TestFIFO_PriorityQueueOrder

coderabbitai · 2026-06-07T05:21:35Z

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f9858098-8c91-46fd-b09f-036fa3b2491d

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

✅ Review completed - (🔄 Check again to review again)

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch schedule-797

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@config-schema.json`:
- Around line 560-590: The JSON schema for the "router" object allows
combinations that the runtime loader LoadConfigFromReader rejects (e.g., both
settings.groups and settings.matrix present, or use="matrix" without
settings.matrix), so update the "router" schema to express the same constraints
as the runtime: make settings mutually exclusive (only one of groupsConfig or
matrixConfig allowed) and add conditional requirements based on "use" (if "use"
is "group" then require settings.groups; if "use" is "matrix" then require
settings.matrix). Implement this using JSON Schema constructs (oneOf or
dependent schemas / if-then-else) referencing the existing router.use,
router.settings, groupsConfig and matrixConfig so schema validation matches
LoadConfigFromReader behavior.

In `@config.example.yaml`:
- Around line 380-515: The example incorrectly defines both router.groups and
router.matrix together (routing.router.settings.groups and
routing.router.settings.matrix), which the config loader rejects; fix by
providing a single routing strategy per example—either remove the entire matrix
block when demonstrating groups (keep groups and related entries like
"group1","group2","forever") or remove the groups section and keep only the
matrix vars/evict_costs/sets (vars g,q,m..., evict_costs, sets such as
standard/with_rerank/creative/full) so the sample uses one valid configuration
approach.
- Around line 370-378: The FIFO priority example uses undefined model IDs
(A/B/C/D) which will fail validation for
routing.scheduler.settings.fifo.priority; replace those entries with actual
model IDs present in your configuration (or use the documented example IDs from
your models section, e.g., the real model keys such as "gpt-4", "gpt-4o",
"gpt-3.5-turbo" or other defined names), or leave the priority map empty to show
the default; update the priority dictionary key names to match the exact model
IDs used elsewhere so the loader validation succeeds.

In `@internal/router/design.md`:
- Around line 83-101: The fenced code block starting at the diagram (containing
"ServeHTTP", "HandlerReq", "baseRouter.run()", "Scheduler.On*(...)", etc.) lacks
a language tag causing MD040; add an explicit language identifier (e.g.,
```text) to the opening fence so the block is fenced as ```text ... ``` to
satisfy markdownlint and keep the diagram formatting unchanged.

In `@internal/router/scheduler/fifo.go`:
- Around line 343-356: The collidesWith function (parameters target, evict,
active) currently only checks target↔evict collisions but misses overlapping
evict sets between different active swaps (evict ∩ sw.evict), allowing two swaps
to evict the same model concurrently; update collidesWith to also detect any
intersection between the caller's evict slice and each active swap's sw.evict
(e.g., by checking if any element of evict exists in sw.evict or adding a helper
like slicesOverlap(evict, sw.evict)) and return true when such overlap is found,
keeping the existing checks for id==target, containsString(evict, id) and
containsString(sw.evict, target).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d4c8053d-576b-43a8-ad12-853c55fcf6b4

📥 Commits

Reviewing files that changed from the base of the PR and between 46cea36 and 79e68ee.

⛔ Files ignored due to path filters (1)

go.sum is excluded by !**/*.sum

📒 Files selected for processing (27)

.github/workflows/config-schema.yml
Makefile
config-schema.json
config.example.yaml
go.mod
internal/config/config.go
internal/config/config_posix_test.go
internal/config/config_schema_test.go
internal/config/config_test.go
internal/config/config_windows_test.go
internal/config/matrix.go
internal/config/matrix_test.go
internal/perf/monitor.go
internal/router/base.go
internal/router/base_test.go
internal/router/design.md
internal/router/group.go
internal/router/group_test.go
internal/router/helpers_test.go
internal/router/matrix.go
internal/router/matrix_test.go
internal/router/router.go
internal/router/scheduler/fifo.go
internal/router/scheduler/fifo_test.go
internal/router/scheduler/scheduler.go
internal/server/server.go
internal/server/server_test.go

coderabbitai · 2026-06-07T05:53:45Z

+  router:
+    use: group # default, valid: group, matrix
+    settings:
+      # groups: a dictionary of group settings
+      # - optional, default: empty dictionary
+      # - provides advanced controls over model swapping behaviour
+      # - using groups some models can be kept loaded indefinitely, while others are swapped out
+      # - model IDs must be defined in the Models section
+      # - a model can only be a member of one group
+      # - group behaviour is controlled via the `swap`, `exclusive` and `persistent` fields
+      # - see issue #109 for details
+      #
+      # NOTE: the example below uses model names that are not defined above for demonstration purposes
+      groups:
+        # group1 works the same as the default behaviour of llama-swap where only one model is allowed
+        # to run a time across the whole llama-swap instance
+        "group1":
+          # swap: controls the model swapping behaviour in within the group
+          # - optional, default: true
+          # - true : only one model is allowed to run at a time
+          # - false: all models can run together, no swapping
+          swap: true
+
+          # exclusive: controls how the group affects other groups
+          # - optional, default: true
+          # - true: causes all other groups to unload when this group runs a model
+          # - false: does not affect other groups
+          exclusive: true
+
+          # members references the models defined above
+          # required
+          members:
+            - "llama"
+            - "qwen-unlisted"
+
+        # Example:
+        # - in group2 all models can run at the same time
+        # - when a different group is loaded it causes all running models in this group to unload
+        "group2":
+          swap: false
+
+          # exclusive: false does not unload other groups when a model in group2 is requested
+          # - the models in group2 will be loaded but will not unload any other groups
+          exclusive: false
+          members:
+            - "docker-llama"
+            - "modelA"
+            - "modelB"
+
+        # Example:
+        # - a persistent group, prevents other groups from unloading it
+        "forever":
+          # persistent: prevents over groups from unloading the models in this group
+          # - optional, default: false
+          # - does not affect individual model behaviour
+          persistent: true
+
+          # set swap/exclusive to false to prevent swapping inside the group
+          # and the unloading of other groups
+          swap: false
+          exclusive: false
+          members:
+            - "forever-modelA"
+            - "forever-modelB"
+            - "forever-modelc"
+
+      # The matrix declares valid combinations of models that can run concurrently.
+      # When a model is requested, the solver finds the cheapest way to make it
+      # available by evicting as few (and least costly) running models as possible.
+      #
+      # Solver behavior:
+      #   1. Request arrives for model X
+      #   2. If X is already running, forward immediately. Done.
+      #   3. Find all sets containing X
+      #   4. For each candidate set, compute cost: sum of evict_costs for
+      #      every running model NOT in that set
+      #   5. Pick lowest cost candidate. Ties broken by definition order.
+      #   6. Evict what needs to stop. Start X. Forward request.
+      #
+      # Subset semantics: a set [a, b, c] means any subset is valid.
+      # Only the requested model is started — others are not preloaded.
+      #
+      # A model not appearing in any set can only run alone.
+      #
+      matrix:
+        # vars: short names for models (alphanumeric, 1-8 chars)
+        # - required for sets and evict_costs settings
+        # - each entry is a short name to a real model ID. Do not use an alias
+        # - used to keep set DSL logic short and easier to read
+        # - sets and evict_costs only use identifiers defined in vars
+        vars:
+          g: gemma-model
+          q: qwen-model
+          m: mistral-model
+          v: voxtral-model
+          e: reranker-model
+          L: llama-70B
+          sd: stable-diffusion
+
+        # evict_costs: relative cost of losing a running model (default: 1)
+        evict_costs:
+          v: 50 # vllm backend, slow cold start
+          L: 30 # 70B weights, slow to load
+
+        # sets: named sets of concurrent model combinations
+        # Values are DSL strings with operators:
+        #   &     AND (models run together)
+        #   |     OR  (alternatives)
+        #   ()    grouping
+        #   +ref  inline another set's expression
+        #
+        # Expansion examples:
+        #   "L"                  → [L]
+        #   "a & b"              → [a, b]
+        #   "a | b"              → [a], [b]
+        #   "(a | b) & c"        → [a, c], [b, c]
+        #   "(a | b) & (c | d)"  → [a,c], [a,d], [b,c], [b,d]
+        #   "+llms & v"          → expands llms inline, then applies & v
+        sets:
+          # LLM + TTS: switching between g/q/m won't evict v
+          # expands to: [g,v], [q,v], [m,v]
+          standard: "(g | q | m) & v"
+
+          # LLM + TTS + reranker
+          # expands to: [g,v,e], [q,v,e]
+          with_rerank: "(g | q) & v & e"
+
+          # LLM + image generation, no TTS
+          # expands to: [g,sd], [q,sd]
+          creative: "(g | q) & sd"
+
+          # 70B model uses all GPUs, can only run alone
+          # expands to: [L]
+          full: "L"
+
 # peers: a dictionary of remote peers and models they provide


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Example mixes groups and matrix under one router config, which loader rejects.

Line 382-Line 515 defines both routing.router.settings.groups and routing.router.settings.matrix. The config loader returns an error for this combination, so this sample is not runnable as written.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@config.example.yaml` around lines 380 - 515, The example incorrectly defines both router.groups and router.matrix together (routing.router.settings.groups and routing.router.settings.matrix), which the config loader rejects; fix by providing a single routing strategy per example—either remove the entire matrix block when demonstrating groups (keep groups and related entries like "group1","group2","forever") or remove the groups section and keep only the matrix vars/evict_costs/sets (vars g,q,m..., evict_costs, sets such as standard/with_rerank/creative/full) so the sample uses one valid configuration approach.

coderabbitai · 2026-06-07T05:53:46Z

+func collidesWith(target string, evict []string, active map[string]*activeSwap) bool {
+	for id, sw := range active {
+		if id == target {
+			continue
+		}
+		if containsString(evict, id) {
+			return true
+		}
+		if containsString(sw.evict, target) {
+			return true
+		}
+	}
+	return false
+}


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Prevent parallel swaps that evict the same model.

Line 348 and Line 351 only guard target↔evict collisions. They miss evict-set overlap (evict ∩ sw.evict), so two active swaps can concurrently stop the same process.

Suggested patch

func collidesWith(target string, evict []string, active map[string]*activeSwap) bool { for id, sw := range active { if id == target { continue } if containsString(evict, id) { return true } if containsString(sw.evict, target) { return true } + if intersects(evict, sw.evict) { + return true + } } return false } +func intersects(a, b []string) bool { + for _, x := range a { + if containsString(b, x) { + return true + } + } + return false +} + func conflictsWithInFlight(evict []string, inFlight map[string]int) bool {

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

func collidesWith(target string, evict []string, active map[string]*activeSwap) bool {

for id, sw := range active {

if id == target {

continue

}

if containsString(evict, id) {

return true

}

if containsString(sw.evict, target) {

return true

}

}

return false

}

func collidesWith(target string, evict []string, active map[string]*activeSwap) bool {

for id, sw := range active {

if id == target {

continue

}

if containsString(evict, id) {

return true

}

if containsString(sw.evict, target) {

return true

}

if intersects(evict, sw.evict) {

return true

}

}

return false

}

func intersects(a, b []string) bool {

for _, x := range a {

if containsString(b, x) {

return true

}

}

return false

}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/router/scheduler/fifo.go` around lines 343 - 356, The collidesWith function (parameters target, evict, active) currently only checks target↔evict collisions but misses overlapping evict sets between different active swaps (evict ∩ sw.evict), allowing two swaps to evict the same model concurrently; update collidesWith to also detect any intersection between the caller's evict slice and each active swap's sw.evict (e.g., by checking if any element of evict exists in sw.evict or adding a helper like slicesOverlap(evict, sw.evict)) and return true when such overlap is found, keeping the existing checks for id==target, containsString(evict, id) and containsString(sw.evict, target).

mostlygeek added 11 commits June 6, 2026 21:01

internal/router: export SwapPlanner

e2be36f

internal/router: rename SwapPlanner to Swapper

5e6fe6d

internal/router: replace "planner" with "swapper" terminology

ead3b39

internal/router: improve accessibility of design.md

b7e4d3b

internal/router/schedule: improve fifo doc

2eab0bf

checkpoint: config refactor

dcd9712

tidy up Makefile and staticcheck

4ba1e3e

mostlygeek added 2 commits June 6, 2026 22:34

internal/config,config-schema: relax config-schema validation

2767d02

.github/workflows: use go test for schema check

79e68ee

mostlygeek mentioned this pull request Jun 7, 2026

Add priority-aware fair scheduling #811

Open

coderabbitai Bot reviewed Jun 7, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement new scheduler #823

Implement new scheduler #823
mostlygeek wants to merge 13 commits into
mainfrom
schedule-797

mostlygeek commented Jun 7, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Jun 7, 2026 •

edited

Loading

Review skipped

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot Jun 7, 2026

Uh oh!

Uh oh!

coderabbitai Bot Jun 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mostlygeek commented Jun 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented Jun 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot Jun 7, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot Jun 7, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mostlygeek commented Jun 7, 2026 •

edited

Loading

coderabbitai Bot commented Jun 7, 2026 •

edited

Loading