You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -213,27 +213,27 @@ Instructions for adding support for new models: [HOWTO-add-model.md](./docs/deve
213
213
214
214
| Backend | Target devices |
215
215
| --- | --- |
216
-
|[Metal](./docs/build.md#metal-build)| Apple Silicon |
217
-
|[BLAS](./docs/build.md#blas-build)| All |
218
-
|[BLIS](./docs/backend/BLIS.md)| All |
219
-
|[SYCL](./docs/backend/SYCL.md)| Intel and Nvidia GPU |
220
-
|[MUSA](./docs/build.md#musa)| Moore Threads MTT GPU |
221
-
|[CUDA](./docs/build.md#cuda)| Nvidia GPU |
222
-
|[hipBLAS](./docs/build.md#hipblas)| AMD GPU |
223
-
|[Vulkan](./docs/build.md#vulkan)| GPU |
224
-
|[CANN](./docs/build.md#cann)| Ascend NPU |
225
-
226
-
## Building and usage
216
+
|[Metal](docs/build.md#metal-build)| Apple Silicon |
217
+
|[BLAS](docs/build.md#blas-build)| All |
218
+
|[BLIS](docs/backend/BLIS.md)| All |
219
+
|[SYCL](docs/backend/SYCL.md)| Intel and Nvidia GPU |
220
+
|[MUSA](docs/build.md#musa)| Moore Threads MTT GPU |
221
+
|[CUDA](docs/build.md#cuda)| Nvidia GPU |
222
+
|[hipBLAS](docs/build.md#hipblas)| AMD GPU |
223
+
|[Vulkan](docs/build.md#vulkan)| GPU |
224
+
|[CANN](docs/build.md#cann)| Ascend NPU |
225
+
226
+
## Building the project
227
227
228
228
The main product of this project is the `llama` library. Its C-style interface can be found in [include/llama.h](include/llama.h).
229
229
The project also includes many example programs and tools using the `llama` library. The examples range from simple, minimal code snippets to sophisticated sub-projects such as an OpenAI-compatible HTTP server. Possible methods for obtaining the binaries:
230
230
231
-
- Clone this repository and build locally, see [how to build](./docs/build.md)
232
-
- On MacOS or Linux, install `llama.cpp` via [brew, flox or nix](./docs/install.md)
233
-
- Use a Docker image, see [documentation for Docker](./docs/docker.md)
231
+
- Clone this repository and build locally, see [how to build](docs/build.md)
232
+
- On MacOS or Linux, install `llama.cpp` via [brew, flox or nix](docs/install.md)
233
+
- Use a Docker image, see [documentation for Docker](docs/docker.md)
234
234
- Download pre-built binaries from [releases](https://github.com/ggerganov/llama.cpp/releases)
235
235
236
-
###Obtaining and quantizing models
236
+
## Obtaining and quantizing models
237
237
238
238
The [Hugging Face](https://huggingface.co) platform hosts a [number of LLMs](https://huggingface.co/models?library=gguf&sort=trending) compatible with `llama.cpp`:
239
239
@@ -251,79 +251,204 @@ The Hugging Face platform provides a variety of online tools for converting, qua
251
251
- Use the [GGUF-editor space](https://huggingface.co/spaces/CISCai/gguf-editor) to edit GGUF meta data in the browser (more info: https://github.com/ggerganov/llama.cpp/discussions/9268)
252
252
- Use the [Inference Endpoints](https://ui.endpoints.huggingface.co/) to directly host `llama.cpp` in the cloud (more info: https://github.com/ggerganov/llama.cpp/discussions/9669)
253
253
254
-
To learn more about model quantization, [read this documentation](./examples/quantize/README.md)
254
+
To learn more about model quantization, [read this documentation](examples/quantize/README.md)
255
255
256
-
### Using the `llama-cli` tool
256
+
##[`llama-cli`](examples/main)
257
257
258
-
Run a basic text completion:
258
+
#### A CLI tool for accessing and experimenting with most of `llama.cpp`'s functionality.
259
259
260
-
```bash
261
-
llama-cli -m your_model.gguf -p "I believe the meaning of life is" -n 128
260
+
- <detailsopen>
261
+
<summary>Run simple text completion</summary>
262
262
263
-
# Output:
264
-
# I believe the meaning of life is to find your own truth and to live in accordance with it. For me, this means being true to myself and following my passions, even if they don't align with societal expectations. I think that's what I love about yoga – it's not just a physical practice, but a spiritual one too. It's about connecting with yourself, listening to your inner voice, and honoring your own unique journey.
265
-
```
263
+
```bash
264
+
llama-cli -m model.gguf -p "I believe the meaning of life is" -n 128
266
265
267
-
See [this page](./examples/main/README.md) for a full list of parameters.
266
+
# I believe the meaning of life is to find your own truth and to live in accordance with it. For me, this means being true to myself and following my passions, even if they don't align with societal expectations. I think that's what I love about yoga – it's not just a physical practice, but a spiritual one too. It's about connecting with yourself, listening to your inner voice, and honoring your own unique journey.
267
+
```
268
268
269
-
### Conversation mode
269
+
</details>
270
270
271
-
Run `llama-cli` in conversation/chat mode by passing the `-cnv` parameter:
271
+
- <details>
272
+
<summary>Run in conversation mode</summary>
272
273
273
-
```bash
274
-
llama-cli -m your_model.gguf -p "You are a helpful assistant" -cnv
274
+
```bash
275
+
llama-cli -m model.gguf -p "You are a helpful assistant" -cnv
275
276
276
-
# Output:
277
-
# > hi, who are you?
278
-
# Hi there! I'm your helpful assistant! I'm an AI-powered chatbot designed to assist and provide information to users like you. I'm here to help answer your questions, provide guidance, and offer support on a wide range of topics. I'm a friendly and knowledgeable AI, and I'm always happy to help with anything you need. What's on your mind, and how can I assist you today?
279
-
#
280
-
# > what is 1+1?
281
-
# Easy peasy! The answer to 1+1 is... 2!
282
-
```
277
+
# > hi, who are you?
278
+
# Hi there! I'm your helpful assistant! I'm an AI-powered chatbot designed to assist and provide information to users like you. I'm here to help answer your questions, provide guidance, and offer support on a wide range of topics. I'm a friendly and knowledgeable AI, and I'm always happy to help with anything you need. What's on your mind, and how can I assist you today?
279
+
#
280
+
# > what is 1+1?
281
+
# Easy peasy! The answer to 1+1 is... 2!
282
+
```
283
283
284
-
By default, the chat template will be taken from the input model. If you want to use another chat template, pass `--chat-template NAME` as a parameter. See the list of [supported templates](https://github.com/ggerganov/llama.cpp/wiki/Templates-supported-by-llama_chat_apply_template)
284
+
</details>
285
285
286
-
```bash
287
-
llama-cli -m your_model.gguf -p "You are a helpful assistant" -cnv --chat-template chatml
288
-
```
286
+
- <details>
287
+
<summary>Run with custom chat template</summary>
289
288
290
-
You can also use your own template via in-prefix, in-suffix and reverse-prompt parameters:
289
+
```bash
290
+
# use the "chatml" template
291
+
llama-cli -m model.gguf -p "You are a helpful assistant" -cnv --chat-template chatml
291
292
292
-
```bash
293
-
llama-cli -m your_model.gguf -p "You are a helpful assistant" -cnv --in-prefix 'User: ' --reverse-prompt 'User:'
294
-
```
293
+
# use a custom template
294
+
llama-cli -m model.gguf -p "You are a helpful assistant" -cnv --in-prefix 'User: ' --reverse-prompt 'User:'
`llama.cpp` can constrain the output of the model via custom grammars. For example, you can force the model to output only JSON:
299
+
</details>
299
300
300
-
```bash
301
-
llama-cli -m your_model.gguf -n 256 --grammar-file grammars/json.gbnf -p 'Request: schedule a call at 8pm; Command:'
302
-
```
301
+
- <details>
302
+
<summary>Constrain the output with a custom grammar</summary>
303
303
304
-
The `grammars/` folder contains a handful of sample grammars. To write your own, check out the [GBNF Guide](./grammars/README.md).
304
+
```bash
305
+
llama-cli -m model.gguf -n 256 --grammar-file grammars/json.gbnf -p 'Request: schedule a call at 8pm; Command:'
305
306
306
-
For authoring more complex JSON grammars, check out https://grammar.intrinsiclabs.ai/
307
+
# {"appointmentTime": "8pm", "appointmentDetails": "schedule a a call"}
308
+
```
307
309
308
-
### Web server (`llama-server`)
310
+
The [grammars/](grammars/) folder contains a handful of sample grammars. To write your own, check out the [GBNF Guide](grammars/README.md).
309
311
310
-
The [llama-server](./examples/server/README.md) is a lightweight [OpenAI API](https://github.com/openai/openai-openapi) compatible HTTP server that can be used to serve local models and easily connect them to existing clients.
312
+
For authoring more complex JSON grammars, check out https://grammar.intrinsiclabs.ai/
311
313
312
-
Example usage:
314
+
</details>
313
315
314
-
```bash
315
-
llama-server -m your_model.gguf --port 8080
316
316
317
-
# Basic web UI can be accessed via browser: http://localhost:8080
If your issue is with model generation quality, then please at least scan the following links and papers to understand the limitations of LLaMA models. This is especially important when choosing an appropriate model size and appreciating both the significant and subtle differences between LLaMA models and ChatGPT:
356
481
- LLaMA:
@@ -361,3 +486,6 @@ If your issue is with model generation quality, then please at least scan the fo
361
486
- GPT-3.5 / InstructGPT / ChatGPT:
362
487
- [Aligning language models to follow instructions](https://openai.com/research/instruction-following)
363
488
- [Training language models to follow instructions with human feedback](https://arxiv.org/abs/2203.02155)
0 commit comments