[pull] master from ggerganov:master #128

pull · 2024-07-10T05:20:50Z

See Commits and Changes for more details.

Can you help keep this open source service alive? 💖 Please sponsor : )

* Adding a simple program to provide a deprecation warning that can exist to help people notice the binary name change from #7809 and migrate to the new filenames. * Build legacy replacement binaries only if they already exist. Check for their existence every time so that they are not ignored.

Update the "Performance troubleshooting" doc link to be correct - the file was moved into a dir called 'development'

…8402) * Load server sampling parameters from the server context by default. * Wordsmithing comment

* update internlm2 * remove unused file * fix lint

Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>

* Upd gguf-py/readme * Bump patch version for release

* Arm AArch64: optimized GEMV and GEMM kernels for q4_0_q8_0, and q8_0_q8_0 quantization * Arm AArch64: add optimized GEMV and GEMM asm kernels for q4_0_q8_0 quantization and refactor code to address llama.cpp pr#5780 suggestions * Arm AArch64: add optimized GEMV and GEMM asm kernels for q4_0_q8_0 quantization and refactor code to address llama.cpp pr#5780 suggestions * Arm AArch64: add optimized GEMV and GEMM asm kernels for q4_0_q8_0 quantization and refactor code to address llama.cpp pr#5780 suggestions * Arm AArch64: add optimized GEMV and GEMM asm kernels for q4_0_q8_0 quantization and refactor code to address llama.cpp pr#5780 suggestions * Arm AArch64: add copyright claim only to ggml-aarch64.cpp and ggml-aarch64.h files * Arm AArch64: minor code refactoring for rebase * Arm AArch64: minor code refactoring for resolving a build issue with cmake * Arm AArch64: minor code refactoring to split the Q4_0_AARC64 type into three separate types: Q4_0_4_4, Q4_0_4_8, and Q4_0_8_8 * Arm AArch64: minor code change for resolving a build issue with server-windows * retrigger checks * Arm AArch64: minor code changes for rebase * Arm AArch64: minor changes to skip the pr#7433 vec_dot code for arm cpus with SVE VL not equal to 256 bits * Arm AArch64: remove stale LLAMA_QKK_64 from CMakeLists.txt and delete build.zig * Arm AArch64: add reference scalar gemm and gemv, and avoid dynamic memory allocations during quantization for Q4_0_4_4, Q4_0_4_8, and Q4_0_8_8 * Arm AArch64: add multithreaded quantization support for the new types: Q4_0_4_4, Q4_0_4_8, and Q4_0_8_8 * Arm AArch64: minor code refactoring * Arm AArch64: simplify logic for calling gemm and gemv functions in ggml_compute_forward_mul_mat * Arm AArch64: minimize changes in ggml_compute_forward_mul_mat * Arm AArch64: minor code refactoring, and add reference scalar code to quantize routines for new quant types * Arm AArch64: minor code refactoring * Arm AArch64: minor code refactoring * Arm AArch64: minor code refactoring * rebase on the latest master commit 3fd62a6 and adapt to the new directory structure * Arm AArch64: remove a redundant comment * Arm AArch64: add pragma in ggml-aarch64.c to turn -Woverlength-strings warning off * Arm AArch64: use __aarch64__ check to guard 64-bit neon kernels * Arm AArch64: update docs/build.md README to include compile time flags for buiilding the Q4_0_4_4 quant type

ggml-ci

JohannesGaessler and others added 4 commits July 9, 2024 17:11

make/cmake: LLAMA_NO_CCACHE -> GGML_NO_CCACHE (#8392)

a03e8dd

Update README.md to fix broken link to docs (#8399)

fd560fe

Update the "Performance troubleshooting" doc link to be correct - the file was moved into a dir called 'development'

Server: Enable setting default sampling parameters via command-line (#…

a59f8fd

…8402) * Load server sampling parameters from the server context by default. * Wordsmithing comment

github-actions bot added examples server build labels Jul 10, 2024

pull bot added ⤵️ pull and removed examples server build labels Jul 10, 2024

py : fix extra space in convert_hf_to_gguf.py (#8407)

8f0fad4

github-actions bot added examples python server build labels Jul 10, 2024

RunningLeon and others added 6 commits July 10, 2024 14:26

py : fix converter for internlm2 (#8321)

e4dd31f

* update internlm2 * remove unused file * fix lint

llama : add assert about missing llama_encode() call (#8400)

a8be1e6

Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>

msvc : silence codecvt c++17 deprecation warnings (#8395)

7a80710

llama : C++20 compatibility for u8 strings (#8408)

cc61948

gguf-py rel pipeline (#8410)

83321c6

* Upd gguf-py/readme * Bump patch version for release

github-actions bot added documentation Improvements or additions to documentation ggml labels Jul 10, 2024

ggerganov and others added 2 commits July 10, 2024 15:23

ggml : move sgemm sources to llamafile subfolder (#8394)

6b2a849

ggml-ci

[SYCL] Use multi_ptr to clean up deprecated warnings (#8256)

f4444d9

github-actions bot added the SYCL label Jul 10, 2024

teleprint-me closed this Jul 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] master from ggerganov:master #128

[pull] master from ggerganov:master #128

pull bot commented Jul 10, 2024 •

edited

Loading

[pull] master from ggerganov:master #128

[pull] master from ggerganov:master #128

Conversation

pull bot commented Jul 10, 2024 • edited Loading

pull bot commented Jul 10, 2024 •

edited

Loading