File tree 1 file changed +5
-6
lines changed
docs/content/docs/advanced 1 file changed +5
-6
lines changed Original file line number Diff line number Diff line change @@ -118,19 +118,18 @@ And we convert it to the gguf format that LocalAI can consume:
118
118
119
119
# Convert to gguf
120
120
git clone https://github.com/ggerganov/llama.cpp.git
121
- pushd llama.cpp && make GGML_CUDA=1 && popd
121
+ pushd llama.cpp && cmake -B build -DGGML_CUDA=ON && cmake --build build --config Release && popd
122
122
123
123
# We need to convert the pytorch model into ggml for quantization
124
124
# It crates 'ggml-model-f16.bin' in the 'merged' directory.
125
- pushd llama.cpp && python convert.py --outtype f16 \
126
- ../qlora-out/merged/pytorch_model-00001-of-00002.bin && popd
125
+ pushd llama.cpp && python3 convert_hf_to_gguf.py ../qlora-out/merged && popd
127
126
128
127
# Start off by making a basic q4_0 4-bit quantization.
129
128
# It's important to have 'ggml' in the name of the quant for some
130
129
# software to recognize it's file format.
131
- pushd llama.cpp && ./quantize ../qlora-out/merged/ggml-model-f16 .gguf \
132
- ../custom-model-q4_0.bin q4_0
130
+ pushd llama.cpp/build/bin && ./llama- quantize ../../../ qlora-out/merged/Merged-33B-F16 .gguf \
131
+ ../../../ custom-model-q4_0.gguf q4_0
133
132
134
133
```
135
134
136
- Now you should have ended up with a ` custom-model-q4_0.bin ` file that you can copy in the LocalAI models directory and use it with LocalAI.
135
+ Now you should have ended up with a ` custom-model-q4_0.gguf ` file that you can copy in the LocalAI models directory and use it with LocalAI.
You can’t perform that action at this time.
0 commit comments