Skip to content

Implement new scheduler #823

Open
mostlygeek wants to merge 13 commits into
mainfrom
schedule-797
Open

Implement new scheduler #823
mostlygeek wants to merge 13 commits into
mainfrom
schedule-797

Conversation

@mostlygeek

@mostlygeek mostlygeek commented Jun 7, 2026

Copy link
Copy Markdown
Owner
  • introduce internal/router/scheduler to decouple routing, swapping and queuing into interface contracts.
  • introduce a new routing configuration section that supersedes matrix and group while maintaining backwards compatibility
  • add FIFO scheduler with prioritized queuing
  • add internal/router/design.md as developer documentation on implementing new schedulers and routers

Fixes #797

mostlygeek added 11 commits June 6, 2026 21:01
Extract the request-scheduling logic out of baseRouter into a new
internal/router/scheduler package so the scheduling strategy can be
swapped independently of the process machinery and the eviction policy.
FIFO is the first and only implementation and reproduces the previous
behaviour exactly.

baseRouter keeps the channels, run loop, ServeHTTP, process lifecycle
(doSwap), and shutdown teardown, and now implements scheduler.Effects so
the scheduler can call back for every side-effect. newBaseRouter takes a
scheduler.Factory (defaulting to FIFO, wired up by Group and Matrix)
instead of a SwapPlanner.

- move SwapPlanner into the scheduler package; groupPlanner and
  matrixPlanner satisfy it structurally and are captured in the factory
  closure each concrete router passes to newBaseRouter
- move the run-loop state (active/inFlight/queued) and the decision tree
  (OnRequest/OnSwapDone/OnServeDone/OnUnload/OnShutdown) into scheduler.FIFO
- alias router.ErrNoLocalModelFound to scheduler.ErrModelNotFound so
  SendError keeps matching the error the scheduler grants
- migrate the FIFO scheduling tests to the scheduler package as direct,
  synchronous unit tests against a fake Effects; trim base_test.go down to
  baseRouter mechanism only (run loop, grant/ServeHTTP, Unload, Shutdown)
- Add the first draft of developer docs for creating new routers and
schedulers.

Refactor the scheduler interfaces to eliminate the extra
map[string]*process.Process in the matrixPlanner and groupPlanner. This
makes the scheduler the single owner of "what's running".
The new routing.router and routing.schedule is the new home for
router and scheduler settings. Backwards compatibility of old
configuration files are maintained with a new config error if both legacy
matrix/group are defined along routing.router settings
Wire config.FifoConfig into the FIFO scheduler so queued requests are
serviced by descending model priority instead of strict arrival order.

- NewFIFO accepts a config.FifoConfig and stores it on the scheduler
- enqueue does a stable priority insert: higher-priority models float
  ahead, equal-priority requests keep FIFO arrival order
- group and matrix routers pass conf.Routing.Scheduler.Settings.Fifo
- add TestFIFO_PriorityQueueOrder
@coderabbitai

coderabbitai Bot commented Jun 7, 2026

Copy link
Copy Markdown

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f9858098-8c91-46fd-b09f-036fa3b2491d

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • ✅ Review completed - (🔄 Check again to review again)
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch schedule-797

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@config-schema.json`:
- Around line 560-590: The JSON schema for the "router" object allows
combinations that the runtime loader LoadConfigFromReader rejects (e.g., both
settings.groups and settings.matrix present, or use="matrix" without
settings.matrix), so update the "router" schema to express the same constraints
as the runtime: make settings mutually exclusive (only one of groupsConfig or
matrixConfig allowed) and add conditional requirements based on "use" (if "use"
is "group" then require settings.groups; if "use" is "matrix" then require
settings.matrix). Implement this using JSON Schema constructs (oneOf or
dependent schemas / if-then-else) referencing the existing router.use,
router.settings, groupsConfig and matrixConfig so schema validation matches
LoadConfigFromReader behavior.

In `@config.example.yaml`:
- Around line 380-515: The example incorrectly defines both router.groups and
router.matrix together (routing.router.settings.groups and
routing.router.settings.matrix), which the config loader rejects; fix by
providing a single routing strategy per example—either remove the entire matrix
block when demonstrating groups (keep groups and related entries like
"group1","group2","forever") or remove the groups section and keep only the
matrix vars/evict_costs/sets (vars g,q,m..., evict_costs, sets such as
standard/with_rerank/creative/full) so the sample uses one valid configuration
approach.
- Around line 370-378: The FIFO priority example uses undefined model IDs
(A/B/C/D) which will fail validation for
routing.scheduler.settings.fifo.priority; replace those entries with actual
model IDs present in your configuration (or use the documented example IDs from
your models section, e.g., the real model keys such as "gpt-4", "gpt-4o",
"gpt-3.5-turbo" or other defined names), or leave the priority map empty to show
the default; update the priority dictionary key names to match the exact model
IDs used elsewhere so the loader validation succeeds.

In `@internal/router/design.md`:
- Around line 83-101: The fenced code block starting at the diagram (containing
"ServeHTTP", "HandlerReq", "baseRouter.run()", "Scheduler.On*(...)", etc.) lacks
a language tag causing MD040; add an explicit language identifier (e.g.,
```text) to the opening fence so the block is fenced as ```text ... ``` to
satisfy markdownlint and keep the diagram formatting unchanged.

In `@internal/router/scheduler/fifo.go`:
- Around line 343-356: The collidesWith function (parameters target, evict,
active) currently only checks target↔evict collisions but misses overlapping
evict sets between different active swaps (evict ∩ sw.evict), allowing two swaps
to evict the same model concurrently; update collidesWith to also detect any
intersection between the caller's evict slice and each active swap's sw.evict
(e.g., by checking if any element of evict exists in sw.evict or adding a helper
like slicesOverlap(evict, sw.evict)) and return true when such overlap is found,
keeping the existing checks for id==target, containsString(evict, id) and
containsString(sw.evict, target).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d4c8053d-576b-43a8-ad12-853c55fcf6b4

📥 Commits

Reviewing files that changed from the base of the PR and between 46cea36 and 79e68ee.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (27)
  • .github/workflows/config-schema.yml
  • Makefile
  • config-schema.json
  • config.example.yaml
  • go.mod
  • internal/config/config.go
  • internal/config/config_posix_test.go
  • internal/config/config_schema_test.go
  • internal/config/config_test.go
  • internal/config/config_windows_test.go
  • internal/config/matrix.go
  • internal/config/matrix_test.go
  • internal/perf/monitor.go
  • internal/router/base.go
  • internal/router/base_test.go
  • internal/router/design.md
  • internal/router/group.go
  • internal/router/group_test.go
  • internal/router/helpers_test.go
  • internal/router/matrix.go
  • internal/router/matrix_test.go
  • internal/router/router.go
  • internal/router/scheduler/fifo.go
  • internal/router/scheduler/fifo_test.go
  • internal/router/scheduler/scheduler.go
  • internal/server/server.go
  • internal/server/server_test.go

Comment thread config-schema.json
Comment thread config.example.yaml
Comment thread config.example.yaml
Comment on lines +380 to 515
router:
use: group # default, valid: group, matrix
settings:
# groups: a dictionary of group settings
# - optional, default: empty dictionary
# - provides advanced controls over model swapping behaviour
# - using groups some models can be kept loaded indefinitely, while others are swapped out
# - model IDs must be defined in the Models section
# - a model can only be a member of one group
# - group behaviour is controlled via the `swap`, `exclusive` and `persistent` fields
# - see issue #109 for details
#
# NOTE: the example below uses model names that are not defined above for demonstration purposes
groups:
# group1 works the same as the default behaviour of llama-swap where only one model is allowed
# to run a time across the whole llama-swap instance
"group1":
# swap: controls the model swapping behaviour in within the group
# - optional, default: true
# - true : only one model is allowed to run at a time
# - false: all models can run together, no swapping
swap: true

# exclusive: controls how the group affects other groups
# - optional, default: true
# - true: causes all other groups to unload when this group runs a model
# - false: does not affect other groups
exclusive: true

# members references the models defined above
# required
members:
- "llama"
- "qwen-unlisted"

# Example:
# - in group2 all models can run at the same time
# - when a different group is loaded it causes all running models in this group to unload
"group2":
swap: false

# exclusive: false does not unload other groups when a model in group2 is requested
# - the models in group2 will be loaded but will not unload any other groups
exclusive: false
members:
- "docker-llama"
- "modelA"
- "modelB"

# Example:
# - a persistent group, prevents other groups from unloading it
"forever":
# persistent: prevents over groups from unloading the models in this group
# - optional, default: false
# - does not affect individual model behaviour
persistent: true

# set swap/exclusive to false to prevent swapping inside the group
# and the unloading of other groups
swap: false
exclusive: false
members:
- "forever-modelA"
- "forever-modelB"
- "forever-modelc"

# The matrix declares valid combinations of models that can run concurrently.
# When a model is requested, the solver finds the cheapest way to make it
# available by evicting as few (and least costly) running models as possible.
#
# Solver behavior:
# 1. Request arrives for model X
# 2. If X is already running, forward immediately. Done.
# 3. Find all sets containing X
# 4. For each candidate set, compute cost: sum of evict_costs for
# every running model NOT in that set
# 5. Pick lowest cost candidate. Ties broken by definition order.
# 6. Evict what needs to stop. Start X. Forward request.
#
# Subset semantics: a set [a, b, c] means any subset is valid.
# Only the requested model is started — others are not preloaded.
#
# A model not appearing in any set can only run alone.
#
matrix:
# vars: short names for models (alphanumeric, 1-8 chars)
# - required for sets and evict_costs settings
# - each entry is a short name to a real model ID. Do not use an alias
# - used to keep set DSL logic short and easier to read
# - sets and evict_costs only use identifiers defined in vars
vars:
g: gemma-model
q: qwen-model
m: mistral-model
v: voxtral-model
e: reranker-model
L: llama-70B
sd: stable-diffusion

# evict_costs: relative cost of losing a running model (default: 1)
evict_costs:
v: 50 # vllm backend, slow cold start
L: 30 # 70B weights, slow to load

# sets: named sets of concurrent model combinations
# Values are DSL strings with operators:
# & AND (models run together)
# | OR (alternatives)
# () grouping
# +ref inline another set's expression
#
# Expansion examples:
# "L" → [L]
# "a & b" → [a, b]
# "a | b" → [a], [b]
# "(a | b) & c" → [a, c], [b, c]
# "(a | b) & (c | d)" → [a,c], [a,d], [b,c], [b,d]
# "+llms & v" → expands llms inline, then applies & v
sets:
# LLM + TTS: switching between g/q/m won't evict v
# expands to: [g,v], [q,v], [m,v]
standard: "(g | q | m) & v"

# LLM + TTS + reranker
# expands to: [g,v,e], [q,v,e]
with_rerank: "(g | q) & v & e"

# LLM + image generation, no TTS
# expands to: [g,sd], [q,sd]
creative: "(g | q) & sd"

# 70B model uses all GPUs, can only run alone
# expands to: [L]
full: "L"

# peers: a dictionary of remote peers and models they provide

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Example mixes groups and matrix under one router config, which loader rejects.

Line 382-Line 515 defines both routing.router.settings.groups and routing.router.settings.matrix. The config loader returns an error for this combination, so this sample is not runnable as written.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@config.example.yaml` around lines 380 - 515, The example incorrectly defines
both router.groups and router.matrix together (routing.router.settings.groups
and routing.router.settings.matrix), which the config loader rejects; fix by
providing a single routing strategy per example—either remove the entire matrix
block when demonstrating groups (keep groups and related entries like
"group1","group2","forever") or remove the groups section and keep only the
matrix vars/evict_costs/sets (vars g,q,m..., evict_costs, sets such as
standard/with_rerank/creative/full) so the sample uses one valid configuration
approach.

Comment thread internal/router/design.md
Comment on lines +343 to +356
func collidesWith(target string, evict []string, active map[string]*activeSwap) bool {
for id, sw := range active {
if id == target {
continue
}
if containsString(evict, id) {
return true
}
if containsString(sw.evict, target) {
return true
}
}
return false
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Prevent parallel swaps that evict the same model.

Line 348 and Line 351 only guard target↔evict collisions. They miss evict-set overlap (evict ∩ sw.evict), so two active swaps can concurrently stop the same process.

Suggested patch
 func collidesWith(target string, evict []string, active map[string]*activeSwap) bool {
 	for id, sw := range active {
 		if id == target {
 			continue
 		}
 		if containsString(evict, id) {
 			return true
 		}
 		if containsString(sw.evict, target) {
 			return true
 		}
+		if intersects(evict, sw.evict) {
+			return true
+		}
 	}
 	return false
 }
 
+func intersects(a, b []string) bool {
+	for _, x := range a {
+		if containsString(b, x) {
+			return true
+		}
+	}
+	return false
+}
+
 func conflictsWithInFlight(evict []string, inFlight map[string]int) bool {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func collidesWith(target string, evict []string, active map[string]*activeSwap) bool {
for id, sw := range active {
if id == target {
continue
}
if containsString(evict, id) {
return true
}
if containsString(sw.evict, target) {
return true
}
}
return false
}
func collidesWith(target string, evict []string, active map[string]*activeSwap) bool {
for id, sw := range active {
if id == target {
continue
}
if containsString(evict, id) {
return true
}
if containsString(sw.evict, target) {
return true
}
if intersects(evict, sw.evict) {
return true
}
}
return false
}
func intersects(a, b []string) bool {
for _, x := range a {
if containsString(b, x) {
return true
}
}
return false
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/router/scheduler/fifo.go` around lines 343 - 356, The collidesWith
function (parameters target, evict, active) currently only checks target↔evict
collisions but misses overlapping evict sets between different active swaps
(evict ∩ sw.evict), allowing two swaps to evict the same model concurrently;
update collidesWith to also detect any intersection between the caller's evict
slice and each active swap's sw.evict (e.g., by checking if any element of evict
exists in sw.evict or adding a helper like slicesOverlap(evict, sw.evict)) and
return true when such overlap is found, keeping the existing checks for
id==target, containsString(evict, id) and containsString(sw.evict, target).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

internal/router: introduce different scheduling functionality

1 participant