Tags: lshzh-ww/llama.cpp
Tags
llama : add classifier-free guidance (ggerganov#2135) * Initial implementation * Remove debug print * Restore signature of llama_init_from_gpt_params * Free guidance context * Make freeing of guidance_ctx conditional * Make Classifier-Free Guidance a sampling function * Correct typo. CFG already means context-free grammar. * Record sampling time in llama_sample_classifier_free_guidance * Shift all values by the max value before applying logsoftmax * Fix styling based on review
Possible solution to allow K-quants on models with n_vocab!=32000 (gg… …erganov#2148) * This allows LLAMA models that were previously incompatible with K quants to function mostly as normal. This happens when a model has a vocab != 32000, e.g 32001 which means it's not divisible by 256 or 64. Since the problematic dimensions only apply for `tok_embeddings.weight` and `output.weight` (dimentions 4096 x n_vocab), we can simply quantize these layers to Q8_0 whereas the majority of the hidden layers are still K-quanted since they have compatible dimensions. * Fix indentation Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * As an alternative, to avoid failing on Metal due to lack of Q8_0 support, instead quantize tok_embeddings.weight to Q4_0 and retain output.weight as F16. This results in a net gain of about 55mb for a 7B model compared to previous approach, but should minimize adverse impact to model quality. --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Support using mmap when applying LoRA (ggerganov#2095) * Support using mmap when applying LoRA * Fix Linux * Update comment to reflect the support lora with mmap
ggml : remove src0 and src1 from ggml_tensor and rename opt to src (g… …gerganov#2178) * Add ggml changes * Update train-text-from-scratch for change * mpi : adapt to new ggml_tensor->src --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
mpi : add support for distributed inference via MPI (ggerganov#2099) * MPI support, first cut * fix warnings, update README * fixes * wrap includes * PR comments * Update CMakeLists.txt * Add GH workflow, fix test * Add info to README * mpi : trying to move more MPI stuff into ggml-mpi (WIP) (ggerganov#2099) * mpi : add names for layer inputs + prep ggml_mpi_graph_compute() * mpi : move all MPI logic into ggml-mpi Not tested yet * mpi : various fixes - communication now works but results are wrong * mpi : fix output tensor after MPI compute (still not working) * mpi : fix inference * mpi : minor * Add OpenMPI to GH action * [mpi] continue-on-error: true * mpi : fix after master merge * [mpi] Link MPI C++ libraries to fix OpenMPI * tests : fix new llama_backend API * [mpi] use MPI_INT32_T * mpi : factor out recv / send in functions and reuse * mpi : extend API to allow usage with outer backends (e.g. Metal) --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
ggml : fix buidling with Intel MKL but ask for "cblas.h" issue (ggerg… …anov#2104) (ggerganov#2115) * Fix buidling with Intel MKL but ask for "cblas.h" issue * Use angle brackets to indicate the system library
llama : remove "first token must be BOS" restriction (ggerganov#2153)
Fixed OpenLLaMA 3b CUDA mul_mat_vec_q (ggerganov#2144)
PreviousNext