Skip to content

Allow the llama_progress_callback to abort model loading early without having to throw an exception #4551

@LoganDark

Description

@LoganDark

Prerequisites

Please answer the following questions for yourself before submitting an issue.

  • I am running the latest code. Development is very rapid so there are no tagged versions as of now.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new bug or useful enhancement to share.

Feature Description

Allow the llama_progress_callback to return a value that will stop the model being loaded, and free all resources.

Motivation

LLMs can brush up against the limits of some computers, and sometimes you just need an emergency stop button. llama.cpp can already catch std::exceptions inside the model loading process and clean up the half-loaded model, but unfortunately, non-C++ languages (such as Rust) can't throw std::exceptions, so even if they do unwind, it won't be caught by llama.cpp's try-catch and the resources used by the model won't actually be properly cleaned up.

Possible Implementation

Allow the llama_progress_callback to return a value that aborts model loading early. Maybe have it return a bool where true is continue and false is abort? This could totally bite existing codebases though since it's really subtle.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions