Skip to content

Tags: lshzh-ww/llama.cpp

Tags

master-c9c74b4

Toggle master-c9c74b4's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
llama : add classifier-free guidance (ggerganov#2135)

* Initial implementation

* Remove debug print

* Restore signature of llama_init_from_gpt_params

* Free guidance context

* Make freeing of guidance_ctx conditional

* Make Classifier-Free Guidance a sampling function

* Correct typo. CFG already means context-free grammar.

* Record sampling time in llama_sample_classifier_free_guidance

* Shift all values by the max value before applying logsoftmax

* Fix styling based on review

master-bbef282

Toggle master-bbef282's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Possible solution to allow K-quants on models with n_vocab!=32000 (gg…

…erganov#2148)

* This allows LLAMA models that were previously incompatible with K quants to function mostly as normal. This happens when a model has a vocab != 32000, e.g 32001 which means it's not divisible by 256 or 64. Since the problematic dimensions only apply for `tok_embeddings.weight` and `output.weight` (dimentions 4096 x n_vocab), we can simply quantize these layers to Q8_0 whereas the majority of the hidden layers are still K-quanted since they have compatible dimensions.

* Fix indentation

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* As an alternative, to avoid failing on Metal due to lack of Q8_0 support, instead quantize tok_embeddings.weight to Q4_0 and retain output.weight as F16. This results in a net gain of about 55mb for a 7B model compared to previous approach, but should minimize adverse impact to model quality.

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

master-2347463

Toggle master-2347463's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Support using mmap when applying LoRA (ggerganov#2095)

* Support using mmap when applying LoRA

* Fix Linux

* Update comment to reflect the support lora with mmap

master-20d7740

Toggle master-20d7740's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
ggml : sync (abort callback, mul / add broadcast, fix alibi) (ggergan…

…ov#2183)

master-5bf2a27

Toggle master-5bf2a27's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
ggml : remove src0 and src1 from ggml_tensor and rename opt to src (g…

…gerganov#2178)

* Add ggml changes

* Update train-text-from-scratch for change

* mpi : adapt to new ggml_tensor->src

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

master-5656d10

Toggle master-5656d10's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
mpi : add support for distributed inference via MPI (ggerganov#2099)

* MPI support, first cut

* fix warnings, update README

* fixes

* wrap includes

* PR comments

* Update CMakeLists.txt

* Add GH workflow, fix test

* Add info to README

* mpi : trying to move more MPI stuff into ggml-mpi (WIP) (ggerganov#2099)

* mpi : add names for layer inputs + prep ggml_mpi_graph_compute()

* mpi : move all MPI logic into ggml-mpi

Not tested yet

* mpi : various fixes - communication now works but results are wrong

* mpi : fix output tensor after MPI compute (still not working)

* mpi : fix inference

* mpi : minor

* Add OpenMPI to GH action

* [mpi] continue-on-error: true

* mpi : fix after master merge

* [mpi] Link MPI C++ libraries to fix OpenMPI

* tests : fix new llama_backend API

* [mpi] use MPI_INT32_T

* mpi : factor out recv / send in functions and reuse

* mpi : extend API to allow usage with outer backends (e.g. Metal)

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

master-db4047a

Toggle master-db4047a's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
main : escape prompt prefix/suffix (ggerganov#2151)

master-3bbc1a1

Toggle master-3bbc1a1's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
ggml : fix buidling with Intel MKL but ask for "cblas.h" issue (ggerg…

…anov#2104) (ggerganov#2115)

* Fix buidling with Intel MKL but ask for "cblas.h" issue

* Use angle brackets to indicate the system library

master-1d16309

Toggle master-1d16309's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
llama : remove "first token must be BOS" restriction (ggerganov#2153)

master-6463955

Toggle master-6463955's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Fixed OpenLLaMA 3b CUDA mul_mat_vec_q (ggerganov#2144)