Closed
Description
I think a specific unloading endpoint would be beneficial when one does not want to wait for the TTL to expire or kill the entire llama-swap process. This would beneficial for pipelines that have multiple processes using VRAM without requiring the use to forcefully kill a specific process.
Right now I'm emulating it with a 'null' model:
"unload": cmd: ls ttl: 1
which I call by '/upstream/unload/' (a little barbaric but I couldn't find any other way of cleanly unload a model before it's expiration time).
Otherwise, great project and thank you for publishing it.