You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
OPENBLAS_BUILD = @echo 'Your OS $(OS) does not appear to be Windows. For faster speeds, install and link a BLAS library. Set LLAMA_OPENBLAS=1 to compile with OpenBLAS support or LLAMA_CLBLAST=1 to compile with ClBlast support. This is just a reminder, not an error.'
Copy file name to clipboardExpand all lines: README.md
+28-2Lines changed: 28 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,32 @@
1
-
# koboldcpp
1
+
# koboldcpp-ROCM
2
+
This is mostly for Linux, but with the release of ROCm frameworks on Windows, it might eventually be possible to run this on Windows.
2
3
3
-
KoboldCpp is an easy-to-use AI text-generation software for GGML models. It's a single self contained distributable from Concedo, that builds off llama.cpp, and adds a versatile Kobold API endpoint, additional format support, backward compatibility, as well as a fancy UI with persistent stories, editing tools, save formats, memory, world info, author's note, characters, scenarios and everything Kobold and Kobold Lite have to offer.
4
+
To install, either use the file "[easy_KCPP-ROCm_install.sh](https://github.com/YellowRoseCx/koboldcpp-rocm/blob/main/easy_KCPP-ROCm_install.sh)" or navigate to the folder you want to download to in Terminal then run
5
+
```
6
+
git clone https://github.com/YellowRoseCx/koboldcpp-rocm.git -b main --depth 1 && \
7
+
cd koboldcpp-rocm && \
8
+
make LLAMA_HIPBLAS=1 -j4 && \
9
+
./koboldcpp.py
10
+
```
11
+
When the KoboldCPP GUI appears, make sure to select "Use CuBLAS/hipBLAS" and set GPU layers
12
+
13
+
Original [llama.cpp rocm port](https://github.com/ggerganov/llama.cpp/pull/1087) by SlyEcho, modified and ported to koboldcpp by YellowRoseCx
14
+
15
+
Comparison with OpenCL using 6800xt
16
+
| Model | Offloading Method | Time Taken - Processing 593 tokens| Time Taken - Generating 200 tokens| Total Time | Perf. Diff.
| Robin 33b q4_K_S |ROCM 6-t, 46/63 Layers on GPU | 14.6s (25ms/T) | 44.1s (221ms/T) | 58.7s (3.4T/s)| **1.19x**
25
+
26
+
--------
27
+
A self contained distributable from Concedo that exposes llama.cpp function bindings, allowing it to be used via a simulated Kobold API endpoint.
28
+
29
+
What does it mean? You get llama.cpp with a fancy UI, persistent stories, editing tools, save formats, memory, world info, author's note, characters, scenarios and everything Kobold and Kobold Lite have to offer. In a tiny package around 20 MB in size, excluding model weights.
0 commit comments