-
Couldn't load subscription status.
- Fork 13.4k
Description
Prerequisites
Please answer the following questions for yourself before submitting an issue.
- I am running the latest code. Development is very rapid so there are no tagged versions as of now.
- I carefully followed the README.md.
- I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- I reviewed the Discussions, and have a new bug or useful enhancement to share.
Feature Description
Allow the llama_progress_callback to return a value that will stop the model being loaded, and free all resources.
Motivation
LLMs can brush up against the limits of some computers, and sometimes you just need an emergency stop button. llama.cpp can already catch std::exceptions inside the model loading process and clean up the half-loaded model, but unfortunately, non-C++ languages (such as Rust) can't throw std::exceptions, so even if they do unwind, it won't be caught by llama.cpp's try-catch and the resources used by the model won't actually be properly cleaned up.
Possible Implementation
Allow the llama_progress_callback to return a value that aborts model loading early. Maybe have it return a bool where true is continue and false is abort? This could totally bite existing codebases though since it's really subtle.