Skip to content

Conversation

@elikoga
Copy link
Member

@elikoga elikoga commented Nov 26, 2025

PL-134151

@elikoga elikoga changed the title Move to llama cpp inference PL-134151 Move to llama cpp inference Dec 3, 2025
elikoga and others added 30 commits January 29, 2026 15:11
provide a CPU backend that checks regular memory usage for
development environments
track per host and per model usage for different backends (RAM, rocm)

introduce a task manager to unifi starting and cleaning up ongoing
background tasks
- also some asyncio cleanups,
- more use of the task manager,
- a small fix to ensure model health is correctly updated on the gateway
when an inference server restarts
- improved logging
we might have been leaking processes before. i couldn't quite
prove why this fixes it, but i'm not leaking processes on my
machine now. will have to check in a real environment.
this gives a chance that we don't load multiple models in parallel
which then can cause overload.
various model handling tasks are now unique and more easily
cleaned up. removes complexity of various approaches to deal
with tasks and their lifecycles
This was a naming decision from earlier experiments. The "with options"
does not add value any longer.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants