-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Here's how to compile and run under MINGW64 from Msys2 #23
Comments
Nice! You got it running on Windows 7. Edit: I just noticed pre-Windows 8 so I'm assuming 7. Looks like you didn't even need the The over 2x speed difference between 13B and 7B models is not surprising, but the fact that it takes several minutes is. If your processor has more threads than 4, you can set But, with the way these models work, memory will always be the biggest bottleneck. This is because any large language model is (in a way) one big equation that is evaluated all at once for each token. So the entire model has to be accessible in memory for this evaluation. For the Vicuna 13B model, looks like this
might indicate that mmap is not working. If you want to tinker, you can change line 55 and 59 on
to
But I'm not sure if it makes any difference. Probably not. The current settings seem to work well for Windows 10 and up. |
Actually I am running windows 10 OS Name Microsoft Windows 10 Pro It's the most "powerful" piece of computing I own.... You can close this out if you wish but I posted this because without specifying gcc and g++ in Msys2 setup the compile fails |
Oh. I misread: Platform/MINGW64_NT-10.0 does indicate windows 10. Wonder why it said pre-windows 8 on the pragma message. Probably some MinGW thing. Oh, and set -t 2 or -t 3 so that you get 1 thread free for the OS. That should absolutely speed things up a bit!
This is great info for others. Better to leave it up! I didn't know it would not compile without setting DCMAKE_CXX_COMPILER and DCMAKE_C_COMPILER. Thanks a lot for this! :) |
I came across a Found the idea from the below: I'm running mkdir build
cd build
cmake -G "MinGW Makefiles" .. -DAVX512=ON
cmake --build . --parallel --config Release I'm not too familiar with CMake. Any suggestions? |
Hi, Thanks for the link. Interesting. The project uses static linking which means that the *dll files are in the .exe already. This was because I didnt want users to have worry about copying those dlls and having them at correct paths. But if you want to build the dll files, then you can set the flag: and you might need to also edit the CMakeLists.txt line 105 to remove the -static references:
If you made the dir I have found that using the gpt4all backend instead of pure llama.cpp is indeed a bit slower. |
Fixit@DAD MINGW64 ~/LlamaGPTJ-chat
$ mkdir build
(myenv)
Fixit@DAD MINGW64 ~/LlamaGPTJ-chat
$ cd build
(myenv)
Fixit@DAD MINGW64 ~/LlamaGPTJ-chat/build
$ mkdir models
(myenv)
Fixit@DAD MINGW64
/LlamaGPTJ-chat/build$ cp ../../MODELS/ggml-vicuna-13b-1.1-q4_2.bin models
(myenv)
Fixit@DAD MINGW64
/LlamaGPTJ-chat/build$ cmake --fresh .. -DCMAKE_CXX_COMPILER=g++.exe -DCMAKE_C_COMPILER=gcc.exe
-- The C compiler identification is GNU 12.2.0
-- The CXX compiler identification is GNU 12.2.0
System is unknown to cmake, create:
Platform/MINGW64_NT-10.0-19045 to use this system, please post your config file on discourse.cmake.org so it can be added to cmake
-- Detecting C compiler ABI info
System is unknown to cmake, create:
Platform/MINGW64_NT-10.0-19045 to use this system, please post your config file on discourse.cmake.org so it can be added to cmake
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /mingw64/bin/gcc.exe - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
System is unknown to cmake, create:
Platform/MINGW64_NT-10.0-19045 to use this system, please post your config file on discourse.cmake.org so it can be added to cmake
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /mingw64/bin/g++.exe - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
System is unknown to cmake, create:
Platform/MINGW64_NT-10.0-19045 to use this system, please post your config file on discourse.cmake.org so it can be added to cmake
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- CMAKE_SYSTEM_PROCESSOR: unknown
-- Unknown architecture
-- Configuring done (25.8s)
-- Generating done (0.5s)
-- Build files have been written to: /home/Fixit/LlamaGPTJ-chat/build
(myenv)
Fixit@DAD MINGW64
/LlamaGPTJ-chat/build$ cmake --build . --parallel
[ 8%] Building C object gpt4all-backend/llama.cpp/CMakeFiles/ggml.dir/ggml.c.obj
C:/msys64/home/Fixit/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c: In function 'quantize_row_q4_
':
C:/msys64/home/Fixit/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c:781:15: warning: unused variabl
e 'nb' [-Wunused-variable]
781 | const int nb = k / QK4_0;
| ^
C:/msys64/home/Fixit/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c: In function 'quantize_row_q4_
':
C:/msys64/home/Fixit/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c:1129:27: warning: unused variab
le 'y' [-Wunused-variable]
1129 | block_q4_1 * restrict y = vy;
| ^
C:/msys64/home/Fixit/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c:1127:15: warning: unused variab
le 'nb' [-Wunused-variable]
1127 | const int nb = k / QK4_1;
| ^
C:/msys64/home/Fixit/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c: In function 'quantize_row_q8_
':
C:/msys64/home/Fixit/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c:1507:15: warning: unused variab
le 'nb' [-Wunused-variable]
1507 | const int nb = k / QK8_1;
| ^
C:/msys64/home/Fixit/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c: In function 'ggml_compute_forw
ard_alibi_f32':
C:/msys64/home/Fixit/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c:9357:15: warning: unused variab
le 'ne2_ne3' [-Wunused-variable]
9357 | const int ne2_ne3 = n/ne1; // ne2*ne3
| ^~~~~~~
C:/msys64/home/Fixit/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c: In function 'ggml_compute_forw
ard_alibi_f16':
C:/msys64/home/Fixit/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c:9419:15: warning: unused variab
le 'ne2' [-Wunused-variable]
9419 | const int ne2 = src0->ne[2]; // n_head -> this is k
| ^~~
C:/msys64/home/Fixit/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c: In function 'ggml_compute_forw
ard_alibi':
C:/msys64/home/Fixit/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c:9468:5: warning: enumeration va
lue 'GGML_TYPE_Q4_3' not handled in switch [-Wswitch]
9468 | switch (src0->type) {
| ^~~~~~
[ 8%] Built target ggml
[ 16%] Building CXX object gpt4all-backend/llama.cpp/CMakeFiles/llama.dir/llama.cpp.obj
In file included from C:/msys64/home/Fixit/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/llama.cpp:8:
C:/msys64/home/Fixit/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/llama_util.h: In constructor 'llama_mm
ap::llama_mmap(llama_file*, bool)':
C:/msys64/home/Fixit/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/llama_util.h:233:94: note: '#pragma me
ssage: warning: You are building for pre-Windows 8; prefetch not supported'
233 | #pragma message("warning: You are building for pre-Windows 8; prefetch not supported
")
|
^
C:/msys64/home/Fixit/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/llama_util.h:201:47: warning: unused p
arameter 'prefetch' [-Wunused-parameter]
201 | llama_mmap(struct llama_file * file, bool prefetch = true) {
| ~~~~~^~~~~~~~~~~~~~~
[ 25%] Linking CXX static library libllama.a
[ 25%] Built target llama
[ 33%] Building CXX object gpt4all-backend/CMakeFiles/llmodel.dir/llama.cpp/examples/common.cpp.obj
[ 41%] Building CXX object gpt4all-backend/CMakeFiles/llmodel.dir/llmodel_c.cpp.obj
[ 50%] Building CXX object gpt4all-backend/CMakeFiles/llmodel.dir/gptj.cpp.obj
[ 58%] Building CXX object gpt4all-backend/CMakeFiles/llmodel.dir/llamamodel.cpp.obj
[ 66%] Building CXX object gpt4all-backend/CMakeFiles/llmodel.dir/mpt.cpp.obj
[ 75%] Building CXX object gpt4all-backend/CMakeFiles/llmodel.dir/utils.cpp.obj
[ 83%] Linking CXX static library libllmodel.a
/mingw64/bin/ar.exe qc libllmodel.a CMakeFiles/llmodel.dir/gptj.cpp.obj CMakeFiles/llmodel.dir/llamamodel.cpp.obj CMakeFiles/llmodel.dir/llama.cpp/examples/common.cpp.obj CMakeFiles/llmodel.dir/llmodel_c.cpp.obj CMakeFiles/llmodel.dir/mpt.cpp.obj CMakeFiles/llmodel.dir/utils.cpp.obj
/mingw64/bin/ranlib.exe libllmodel.a
[ 83%] Built target llmodel
[ 91%] Building CXX object src/CMakeFiles/chat.dir/chat.cpp.obj
[100%] Linking CXX executable ../bin/chat
[100%] Built target chat
(myenv)
Fixit@DAD MINGW64 ~/LlamaGPTJ-chat/build
$ bin/chat
LlamaGPTJ-chat (v. 0.3.0)
Your computer supports AVX2
LlamaGPTJ-chat: loading .\models\ggml-vicuna-13b-1.1-q4_2.bin
llama.cpp: loading model from .\models\ggml-vicuna-13b-1.1-q4_2.bin
llama_model_load_internal: format = ggjt v1 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 2048
llama_model_load_internal: n_embd = 5120
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 40
llama_model_load_internal: n_layer = 40
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 5 (mostly Q4_2)
llama_model_load_internal: n_ff = 13824
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size = 73.73 KB
llama_model_load_internal: mem required = 9807.47 MB (+ 1608.00 MB per state)
llama_init_from_file: kv self size = 1600.00 MB
LlamaGPTJ-chat: done loading!
Took 5 minutes to respond with just saying hello
UPDATE #1
$ ./bin/chat -m "models/ggml-gpt4all-j-v1.3-groovy.bin" -t 4
LlamaGPTJ-chat (v. 0.3.0)
Your computer supports AVX2
LlamaGPTJ-chat: loading models/ggml-gpt4all-j-v1.3-groovy.bin
gptj_model_load: loading model from 'models/ggml-gpt4all-j-v1.3-groovy.bin' - please wait ...
gptj_model_load: n_vocab = 50400
gptj_model_load: n_ctx = 2048
gptj_model_load: n_embd = 4096
gptj_model_load: n_head = 16
gptj_model_load: n_layer = 28
gptj_model_load: n_rot = 64
gptj_model_load: f16 = 2
gptj_model_load: ggml ctx size = 5401.45 MB
gptj_model_load: kv self size = 896.00 MB
gptj_model_load: .............................................. done
gptj_model_load: model size = 3609.38 MB / num tensors = 285
LlamaGPTJ-chat: done loading!
The text was updated successfully, but these errors were encountered: