Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llamamodel: prevent CUDA OOM crash by allocating VRAM early #2393

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Commits on May 30, 2024

  1. backend: make binding n_batch default consistent with UI

    Signed-off-by: Jared Van Bortel <jared@nomic.ai>
    cebtenzzre committed May 30, 2024
    Configuration menu
    Copy the full SHA
    b48e336 View commit details
    Browse the repository at this point in the history
  2. llamamodel: set batch size to known max to reduce mem usage

    Signed-off-by: Jared Van Bortel <jared@nomic.ai>
    cebtenzzre committed May 30, 2024
    Configuration menu
    Copy the full SHA
    cff5a53 View commit details
    Browse the repository at this point in the history
  3. chatllm: do not report 100% progress until actually complete

    Signed-off-by: Jared Van Bortel <jared@nomic.ai>
    cebtenzzre committed May 30, 2024
    Configuration menu
    Copy the full SHA
    a16df5d View commit details
    Browse the repository at this point in the history
  4. llama.cpp: update submodule for CUDA exceptions and CPU skip

    Signed-off-by: Jared Van Bortel <jared@nomic.ai>
    cebtenzzre committed May 30, 2024
    Configuration menu
    Copy the full SHA
    19c9506 View commit details
    Browse the repository at this point in the history
  5. llamamodel: trigger CUDA OOM early so we can fall back

    Signed-off-by: Jared Van Bortel <jared@nomic.ai>
    cebtenzzre committed May 30, 2024
    Configuration menu
    Copy the full SHA
    b4adcba View commit details
    Browse the repository at this point in the history