forked from ggerganov/llama.cpp
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Reorganize documentation pages (ggerganov#8325)
* re-organize docs * add link among docs * add link to build docs * fix style * de-duplicate sections
- Loading branch information
Showing
14 changed files
with
626 additions
and
603 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,56 @@ | ||
|
||
# Android | ||
|
||
## Build on Android using Termux | ||
[Termux](https://github.com/termux/termux-app#installation) is a method to execute `llama.cpp` on an Android device (no root required). | ||
``` | ||
apt update && apt upgrade -y | ||
apt install git make cmake | ||
``` | ||
|
||
It's recommended to move your model inside the `~/` directory for best performance: | ||
``` | ||
cd storage/downloads | ||
mv model.gguf ~/ | ||
``` | ||
|
||
[Get the code](https://github.com/ggerganov/llama.cpp#get-the-code) & [follow the Linux build instructions](https://github.com/ggerganov/llama.cpp#build) to build `llama.cpp`. | ||
|
||
## Building the Project using Android NDK | ||
Obtain the [Android NDK](https://developer.android.com/ndk) and then build with CMake. | ||
|
||
Execute the following commands on your computer to avoid downloading the NDK to your mobile. Alternatively, you can also do this in Termux: | ||
``` | ||
$ mkdir build-android | ||
$ cd build-android | ||
$ export NDK=<your_ndk_directory> | ||
$ cmake -DCMAKE_TOOLCHAIN_FILE=$NDK/build/cmake/android.toolchain.cmake -DANDROID_ABI=arm64-v8a -DANDROID_PLATFORM=android-23 -DCMAKE_C_FLAGS=-march=armv8.4a+dotprod .. | ||
$ make | ||
``` | ||
|
||
Install [termux](https://github.com/termux/termux-app#installation) on your device and run `termux-setup-storage` to get access to your SD card (if Android 11+ then run the command twice). | ||
|
||
Finally, copy these built `llama` binaries and the model file to your device storage. Because the file permissions in the Android sdcard cannot be changed, you can copy the executable files to the `/data/data/com.termux/files/home/bin` path, and then execute the following commands in Termux to add executable permission: | ||
|
||
(Assumed that you have pushed the built executable files to the /sdcard/llama.cpp/bin path using `adb push`) | ||
``` | ||
$cp -r /sdcard/llama.cpp/bin /data/data/com.termux/files/home/ | ||
$cd /data/data/com.termux/files/home/bin | ||
$chmod +x ./* | ||
``` | ||
|
||
Download model [llama-2-7b-chat.Q4_K_M.gguf](https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/blob/main/llama-2-7b-chat.Q4_K_M.gguf), and push it to `/sdcard/llama.cpp/`, then move it to `/data/data/com.termux/files/home/model/` | ||
|
||
``` | ||
$mv /sdcard/llama.cpp/llama-2-7b-chat.Q4_K_M.gguf /data/data/com.termux/files/home/model/ | ||
``` | ||
|
||
Now, you can start chatting: | ||
``` | ||
$cd /data/data/com.termux/files/home/bin | ||
$./llama-cli -m ../model/llama-2-7b-chat.Q4_K_M.gguf -n 128 -cml | ||
``` | ||
|
||
Here's a demo of an interactive session running on Pixel 5 phone: | ||
|
||
https://user-images.githubusercontent.com/271616/225014776-1d567049-ad71-4ef2-b050-55b0b3b9274c.mp4 |
File renamed without changes.
File renamed without changes.
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,86 @@ | ||
# Docker | ||
|
||
## Prerequisites | ||
* Docker must be installed and running on your system. | ||
* Create a folder to store big models & intermediate files (ex. /llama/models) | ||
|
||
## Images | ||
We have three Docker images available for this project: | ||
|
||
1. `ghcr.io/ggerganov/llama.cpp:full`: This image includes both the main executable file and the tools to convert LLaMA models into ggml and convert into 4-bit quantization. (platforms: `linux/amd64`, `linux/arm64`) | ||
2. `ghcr.io/ggerganov/llama.cpp:light`: This image only includes the main executable file. (platforms: `linux/amd64`, `linux/arm64`) | ||
3. `ghcr.io/ggerganov/llama.cpp:server`: This image only includes the server executable file. (platforms: `linux/amd64`, `linux/arm64`) | ||
|
||
Additionally, there the following images, similar to the above: | ||
|
||
- `ghcr.io/ggerganov/llama.cpp:full-cuda`: Same as `full` but compiled with CUDA support. (platforms: `linux/amd64`) | ||
- `ghcr.io/ggerganov/llama.cpp:light-cuda`: Same as `light` but compiled with CUDA support. (platforms: `linux/amd64`) | ||
- `ghcr.io/ggerganov/llama.cpp:server-cuda`: Same as `server` but compiled with CUDA support. (platforms: `linux/amd64`) | ||
- `ghcr.io/ggerganov/llama.cpp:full-rocm`: Same as `full` but compiled with ROCm support. (platforms: `linux/amd64`, `linux/arm64`) | ||
- `ghcr.io/ggerganov/llama.cpp:light-rocm`: Same as `light` but compiled with ROCm support. (platforms: `linux/amd64`, `linux/arm64`) | ||
- `ghcr.io/ggerganov/llama.cpp:server-rocm`: Same as `server` but compiled with ROCm support. (platforms: `linux/amd64`, `linux/arm64`) | ||
|
||
The GPU enabled images are not currently tested by CI beyond being built. They are not built with any variation from the ones in the Dockerfiles defined in [.devops/](.devops/) and the GitHub Action defined in [.github/workflows/docker.yml](.github/workflows/docker.yml). If you need different settings (for example, a different CUDA or ROCm library, you'll need to build the images locally for now). | ||
|
||
## Usage | ||
|
||
The easiest way to download the models, convert them to ggml and optimize them is with the --all-in-one command which includes the full docker image. | ||
|
||
Replace `/path/to/models` below with the actual path where you downloaded the models. | ||
|
||
```bash | ||
docker run -v /path/to/models:/models ghcr.io/ggerganov/llama.cpp:full --all-in-one "/models/" 7B | ||
``` | ||
|
||
On completion, you are ready to play! | ||
|
||
```bash | ||
docker run -v /path/to/models:/models ghcr.io/ggerganov/llama.cpp:full --run -m /models/7B/ggml-model-q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 512 | ||
``` | ||
|
||
or with a light image: | ||
|
||
```bash | ||
docker run -v /path/to/models:/models ghcr.io/ggerganov/llama.cpp:light -m /models/7B/ggml-model-q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 512 | ||
``` | ||
|
||
or with a server image: | ||
|
||
```bash | ||
docker run -v /path/to/models:/models -p 8000:8000 ghcr.io/ggerganov/llama.cpp:server -m /models/7B/ggml-model-q4_0.gguf --port 8000 --host 0.0.0.0 -n 512 | ||
``` | ||
|
||
## Docker With CUDA | ||
|
||
Assuming one has the [nvidia-container-toolkit](https://github.com/NVIDIA/nvidia-container-toolkit) properly installed on Linux, or is using a GPU enabled cloud, `cuBLAS` should be accessible inside the container. | ||
|
||
## Building Docker locally | ||
|
||
```bash | ||
docker build -t local/llama.cpp:full-cuda -f .devops/full-cuda.Dockerfile . | ||
docker build -t local/llama.cpp:light-cuda -f .devops/llama-cli-cuda.Dockerfile . | ||
docker build -t local/llama.cpp:server-cuda -f .devops/llama-server-cuda.Dockerfile . | ||
``` | ||
|
||
You may want to pass in some different `ARGS`, depending on the CUDA environment supported by your container host, as well as the GPU architecture. | ||
|
||
The defaults are: | ||
|
||
- `CUDA_VERSION` set to `11.7.1` | ||
- `CUDA_DOCKER_ARCH` set to `all` | ||
|
||
The resulting images, are essentially the same as the non-CUDA images: | ||
|
||
1. `local/llama.cpp:full-cuda`: This image includes both the main executable file and the tools to convert LLaMA models into ggml and convert into 4-bit quantization. | ||
2. `local/llama.cpp:light-cuda`: This image only includes the main executable file. | ||
3. `local/llama.cpp:server-cuda`: This image only includes the server executable file. | ||
|
||
## Usage | ||
|
||
After building locally, Usage is similar to the non-CUDA examples, but you'll need to add the `--gpus` flag. You will also want to use the `--n-gpu-layers` flag. | ||
|
||
```bash | ||
docker run --gpus all -v /path/to/models:/models local/llama.cpp:full-cuda --run -m /models/7B/ggml-model-q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 512 --n-gpu-layers 1 | ||
docker run --gpus all -v /path/to/models:/models local/llama.cpp:light-cuda -m /models/7B/ggml-model-q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 512 --n-gpu-layers 1 | ||
docker run --gpus all -v /path/to/models:/models local/llama.cpp:server-cuda -m /models/7B/ggml-model-q4_0.gguf --port 8000 --host 0.0.0.0 -n 512 --n-gpu-layers 1 | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
# Install pre-built version of llama.cpp | ||
|
||
## Homebrew | ||
|
||
On Mac and Linux, the homebrew package manager can be used via | ||
|
||
```sh | ||
brew install llama.cpp | ||
``` | ||
The formula is automatically updated with new `llama.cpp` releases. More info: https://github.com/ggerganov/llama.cpp/discussions/7668 | ||
|
||
## Nix | ||
|
||
On Mac and Linux, the Nix package manager can be used via | ||
|
||
```sh | ||
nix profile install nixpkgs#llama-cpp | ||
``` | ||
For flake enabled installs. | ||
|
||
Or | ||
|
||
```sh | ||
nix-env --file '<nixpkgs>' --install --attr llama-cpp | ||
``` | ||
|
||
For non-flake enabled installs. | ||
|
||
This expression is automatically updated within the [nixpkgs repo](https://github.com/NixOS/nixpkgs/blob/nixos-24.05/pkgs/by-name/ll/llama-cpp/package.nix#L164). | ||
|
||
## Flox | ||
|
||
On Mac and Linux, Flox can be used to install llama.cpp within a Flox environment via | ||
|
||
```sh | ||
flox install llama-cpp | ||
``` | ||
|
||
Flox follows the nixpkgs build of llama.cpp. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters