Skip to content
This repository has been archived by the owner on Aug 25, 2024. It is now read-only.

[pull] master from ggerganov:master #141

Closed
wants to merge 11 commits into from
Closed

Conversation

pull[bot]
Copy link

@pull pull bot commented Aug 11, 2024

See Commits and Changes for more details.


Created by pull[bot]

Can you help keep this open source service alive? 💖 Please sponsor : )

mtavenrath and others added 4 commits August 11, 2024 10:09
…ronization overhead. (#8943)

* Optimize Vulkan backend for better CPU performance and less GPU synchronization overhead.

- Allocation overhead for the temporary std::vectors was easily detectable with a sampling profiler and simple to remove.
- ggml_vk_sync_buffer introduce a full pipeline sync which has a significant cost on the GPU side, sometimes larger than the actual kernel execution. Adding only barriers for shader read/writes and transfers seems to be sufficient looking at the code which either launches compute kernels or copies tensors.

* Fix small typo

---------

Co-authored-by: 0cc4m <picard12@live.de>
…8956)


Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>
Co-authored-by: Neo Zhang <>
@github-actions github-actions bot added documentation Improvements or additions to documentation ggml Vulkan SYCL labels Aug 11, 2024
* gguf-py : Numpy dequantization for most types

* gguf-py : Numpy dequantization for grid-based i-quants
@pull pull bot added ⤵️ pull and removed documentation Improvements or additions to documentation python ggml Vulkan SYCL labels Aug 11, 2024
* py : fix requirements check '==' -> '~='

* cont : fix the fix

* ci : run on all requirements.txt
Septa2112 and others added 4 commits August 12, 2024 11:46
Fixes: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=70724

In order to access the above bug you need to login using one of the
emails in
https://github.com/google/oss-fuzz/blob/master/projects/llamacpp/project.yaml#L3-L5

Signed-off-by: David Korczynski <david@adalogics.com>
* readme: introduce gpustack

GPUStack is an open-source GPU cluster manager for running large
language models, which uses llama.cpp as the backend.

Signed-off-by: thxCode <thxcode0824@gmail.com>

* readme: introduce gguf-parser

GGUF Parser is a tool to review/check the GGUF file and estimate the
memory usage without downloading the whole model.

Signed-off-by: thxCode <thxcode0824@gmail.com>

---------

Signed-off-by: thxCode <thxcode0824@gmail.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants