Skip to content

Commit 74349d3

Browse files
committed
Add information for Podman as well as Docker
We believe Podman is a viable alternative to Docker. Lots of people have moved to Podman, and the project should make sure people adopt it. Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
1 parent 106045e commit 74349d3

File tree

3 files changed

+67
-17
lines changed

3 files changed

+67
-17
lines changed

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -242,7 +242,7 @@ The project also includes many example programs and tools using the `llama` libr
242242

243243
- Clone this repository and build locally, see [how to build](docs/build.md)
244244
- On MacOS or Linux, install `llama.cpp` via [brew, flox or nix](docs/install.md)
245-
- Use a Docker image, see [documentation for Docker](docs/docker.md)
245+
- Use a container image, see [documentation for containers](docs/container.md)
246246
- Download pre-built binaries from [releases](https://github.com/ggerganov/llama.cpp/releases)
247247

248248
## Obtaining and quantizing models
@@ -500,7 +500,7 @@ To learn more about model quantization, [read this documentation](examples/quant
500500
#### Development documentation
501501
502502
- [How to build](docs/build.md)
503-
- [Running on Docker](docs/docker.md)
503+
- [Running in a container](docs/container.md)
504504
- [Build on Android](docs/android.md)
505505
- [Performance troubleshooting](docs/development/token_generation_performance_tips.md)
506506
- [GGML tips & tricks](https://github.com/ggerganov/llama.cpp/wiki/GGML-Tips-&-Tricks)

docs/build.md

Lines changed: 13 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -94,13 +94,13 @@ Building through oneAPI compilers will make avx_vnni instruction set available f
9494
- Using manual oneAPI installation:
9595
By default, `GGML_BLAS_VENDOR` is set to `Generic`, so if you already sourced intel environment script and assign `-DGGML_BLAS=ON` in cmake, the mkl version of Blas will automatically been selected. Otherwise please install oneAPI and follow the below steps:
9696
```bash
97-
source /opt/intel/oneapi/setvars.sh # You can skip this step if in oneapi-basekit docker image, only required for manual installation
97+
source /opt/intel/oneapi/setvars.sh # You can skip this step if in oneapi-basekit container image, only required for manual installation
9898
cmake -B build -DGGML_BLAS=ON -DGGML_BLAS_VENDOR=Intel10_64lp -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DGGML_NATIVE=ON
9999
cmake --build build --config Release
100100
```
101101
102-
- Using oneAPI docker image:
103-
If you do not want to source the environment vars and install oneAPI manually, you can also build the code using intel docker container: [oneAPI-basekit](https://hub.docker.com/r/intel/oneapi-basekit). Then, you can use the commands given above.
102+
- Using oneAPI container image:
103+
If you do not want to source the environment vars and install oneAPI manually, you can also build the code using intel container: [oneAPI-basekit](https://hub.docker.com/r/intel/oneapi-basekit). Then, you can use the commands given above.
104104
105105
Check [Optimizing and Running LLaMA2 on Intel® CPU](https://www.intel.com/content/www/us/en/content-details/791610/optimizing-and-running-llama2-on-intel-cpu.html) for more information.
106106
@@ -280,19 +280,25 @@ cmake -B build -DGGML_VULKAN=ON
280280
cmake --build build --config Release
281281
```
282282

283-
**With docker**:
283+
**With containers**:
284284

285285
You don't need to install Vulkan SDK. It will be installed inside the container.
286286

287-
```sh
288287
# Build the image
289-
docker build -t llama-cpp-vulkan --target light -f .devops/vulkan.Dockerfile .
288+
289+
<details><summary>Docker example</summary>docker build -t llama-cpp-vulkan --target light -f .devops/vulkan.Dockerfile .</details>
290+
<details><summary>Podman example</summary>podman build -t llama-cpp-vulkan --target light -f .devops/vulkan.Dockerfile .</details>
291+
290292

291293
# Then, use it:
292294
docker run -it --rm -v "$(pwd):/app:Z" --device /dev/dri/renderD128:/dev/dri/renderD128 --device /dev/dri/card1:/dev/dri/card1 llama-cpp-vulkan -m "/app/models/YOUR_MODEL_FILE" -p "Building a website can be done in 10 simple steps:" -n 400 -e -ngl 33
295+
296+
or
297+
298+
podman run --security-opt label=disable -it --rm -v "$(pwd):/app:Z" --device /dev/dri/renderD128:/dev/dri/renderD128 --device /dev/dri/card1:/dev/dri/card1 llama-cpp-vulkan -m "/app/models/YOUR_MODEL_FILE" -p "Building a website can be done in 10 simple steps:" -n 400 -e -ngl 33
293299
```
294300
295-
**Without docker**:
301+
**Without a container**:
296302
297303
Firstly, you need to make sure you have installed [Vulkan SDK](https://vulkan.lunarg.com/doc/view/latest/linux/getting_started_ubuntu.html)
298304

docs/docker.md renamed to docs/container.md

Lines changed: 52 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,11 @@
1-
# Docker
1+
# Container
22

33
## Prerequisites
4-
* Docker must be installed and running on your system.
4+
* A container engine, ie Docker/Podman, must be installed and running on your system.
55
* Create a folder to store big models & intermediate files (ex. /llama/models)
66

77
## Images
8-
We have three Docker images available for this project:
8+
We have three container images available for this project:
99

1010
1. `ghcr.io/ggerganov/llama.cpp:full`: This image includes both the main executable file and the tools to convert LLaMA models into ggml and convert into 4-bit quantization. (platforms: `linux/amd64`, `linux/arm64`)
1111
2. `ghcr.io/ggerganov/llama.cpp:light`: This image only includes the main executable file. (platforms: `linux/amd64`, `linux/arm64`)
@@ -27,44 +27,73 @@ The GPU enabled images are not currently tested by CI beyond being built. They a
2727

2828
## Usage
2929

30-
The easiest way to download the models, convert them to ggml and optimize them is with the --all-in-one command which includes the full docker image.
30+
The easiest way to download the models, convert them to ggml and optimize them is with the --all-in-one command which includes the full container image.
3131

3232
Replace `/path/to/models` below with the actual path where you downloaded the models.
3333

3434
```bash
3535
docker run -v /path/to/models:/models ghcr.io/ggerganov/llama.cpp:full --all-in-one "/models/" 7B
3636
```
37+
or
38+
39+
```bash
40+
podman run --security-opt label=disable -v /path/to/models:/models ghcr.io/ggerganov/llama.cpp:full --all-in-one "/models/" 7B
41+
```
3742

3843
On completion, you are ready to play!
3944

4045
```bash
4146
docker run -v /path/to/models:/models ghcr.io/ggerganov/llama.cpp:full --run -m /models/7B/ggml-model-q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 512
4247
```
4348

49+
```bash
50+
podman run --security-opt label=disable -v /path/to/models:/models ghcr.io/ggerganov/llama.cpp:full --run -m /models/7B/ggml-model-q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 512
51+
```
52+
4453
or with a light image:
4554

4655
```bash
4756
docker run -v /path/to/models:/models ghcr.io/ggerganov/llama.cpp:light -m /models/7B/ggml-model-q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 512
4857
```
4958

59+
or
60+
61+
```bash
62+
podman run --security-opt label=disable -v /path/to/models:/models ghcr.io/ggerganov/llama.cpp:light -m /models/7B/ggml-model-q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 512
63+
```
64+
5065
or with a server image:
5166

5267
```bash
5368
docker run -v /path/to/models:/models -p 8000:8000 ghcr.io/ggerganov/llama.cpp:server -m /models/7B/ggml-model-q4_0.gguf --port 8000 --host 0.0.0.0 -n 512
5469
```
5570

56-
## Docker With CUDA
71+
or
72+
73+
```bash
74+
podman run --security-opt label=disable -v /path/to/models:/models -p 8000:8000 ghcr.io/ggerganov/llama.cpp:server -m /models/7B/ggml-model-q4_0.gguf --port 8000 --host 0.0.0.0 -n 512
75+
```
76+
77+
## Container engines With CUDA
5778

5879
Assuming one has the [nvidia-container-toolkit](https://github.com/NVIDIA/nvidia-container-toolkit) properly installed on Linux, or is using a GPU enabled cloud, `cuBLAS` should be accessible inside the container.
5980

60-
## Building Docker locally
81+
## Building Container locally
6182

6283
```bash
6384
docker build -t local/llama.cpp:full-cuda --target full -f .devops/cuda.Dockerfile .
6485
docker build -t local/llama.cpp:light-cuda --target light -f .devops/cuda.Dockerfile .
6586
docker build -t local/llama.cpp:server-cuda --target server -f .devops/cuda.Dockerfile .
6687
```
6788

89+
or
90+
91+
```bash
92+
podman build -t local/llama.cpp:full-cuda --target full -f .devops/cuda.Dockerfile .
93+
podman build -t local/llama.cpp:light-cuda --target light -f .devops/cuda.Dockerfile .
94+
podman build -t local/llama.cpp:server-cuda --target server -f .devops/cuda.Dockerfile .
95+
```
96+
6897
You may want to pass in some different `ARGS`, depending on the CUDA environment supported by your container host, as well as the GPU architecture.
6998

7099
The defaults are:
@@ -88,17 +117,32 @@ docker run --gpus all -v /path/to/models:/models local/llama.cpp:light-cuda -m /
88117
docker run --gpus all -v /path/to/models:/models local/llama.cpp:server-cuda -m /models/7B/ggml-model-q4_0.gguf --port 8000 --host 0.0.0.0 -n 512 --n-gpu-layers 1
89118
```
90119

91-
## Docker With MUSA
120+
or
121+
122+
```bash
123+
podman run --security-opt label=disable --gpus all -v /path/to/models:/models local/llama.cpp:full-cuda --run -m /models/7B/ggml-model-q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 512 --n-gpu-layers 1
124+
podman run --security-opt label=disable --gpus all -v /path/to/models:/models local/llama.cpp:light-cuda -m /models/7B/ggml-model-q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 512 --n-gpu-layers 1
125+
podman run --security-opt label=disable --gpus all -v /path/to/models:/models local/llama.cpp:server-cuda -m /models/7B/ggml-model-q4_0.gguf --port 8000 --host 0.0.0.0 -n 512 --n-gpu-layers 1
126+
```
127+
128+
## Container engines With MUSA
92129

93130
Assuming one has the [mt-container-toolkit](https://developer.mthreads.com/musa/native) properly installed on Linux, `muBLAS` should be accessible inside the container.
94131

95-
## Building Docker locally
132+
## Building Container images locally
96133

97134
```bash
98135
docker build -t local/llama.cpp:full-musa --target full -f .devops/musa.Dockerfile .
99136
docker build -t local/llama.cpp:light-musa --target light -f .devops/musa.Dockerfile .
100137
docker build -t local/llama.cpp:server-musa --target server -f .devops/musa.Dockerfile .
101138
```
139+
or
140+
141+
```bash
142+
podman build -t local/llama.cpp:full-musa --target full -f .devops/musa.Dockerfile .
143+
podman build -t local/llama.cpp:light-musa --target light -f .devops/musa.Dockerfile .
144+
podman build -t local/llama.cpp:server-musa --target server -f .devops/musa.Dockerfile .
145+
```
102146

103147
You may want to pass in some different `ARGS`, depending on the MUSA environment supported by your container host, as well as the GPU architecture.
104148

0 commit comments

Comments
 (0)