You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We believe Podman is a viable alternative to Docker.
Lots of people have moved to Podman, and the project
should make sure people adopt it.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
Copy file name to clipboardExpand all lines: docs/build.md
+13-7Lines changed: 13 additions & 7 deletions
Original file line number
Diff line number
Diff line change
@@ -94,13 +94,13 @@ Building through oneAPI compilers will make avx_vnni instruction set available f
94
94
- Using manual oneAPI installation:
95
95
By default, `GGML_BLAS_VENDOR` is set to `Generic`, so if you already sourced intel environment script and assign `-DGGML_BLAS=ON` in cmake, the mkl version of Blas will automatically been selected. Otherwise please install oneAPI and follow the below steps:
96
96
```bash
97
-
source /opt/intel/oneapi/setvars.sh # You can skip this step if in oneapi-basekit docker image, only required for manual installation
97
+
source /opt/intel/oneapi/setvars.sh # You can skip this step if in oneapi-basekit container image, only required for manual installation
If you do not want to source the environment vars and install oneAPI manually, you can also build the code using intel docker container: [oneAPI-basekit](https://hub.docker.com/r/intel/oneapi-basekit). Then, you can use the commands given above.
102
+
- Using oneAPI container image:
103
+
If you do not want to source the environment vars and install oneAPI manually, you can also build the code using intel container: [oneAPI-basekit](https://hub.docker.com/r/intel/oneapi-basekit). Then, you can use the commands given above.
104
104
105
105
Check [Optimizing and Running LLaMA2 on Intel® CPU](https://www.intel.com/content/www/us/en/content-details/791610/optimizing-and-running-llama2-on-intel-cpu.html) for more information.
Copy file name to clipboardExpand all lines: docs/container.md
+52-8Lines changed: 52 additions & 8 deletions
Original file line number
Diff line number
Diff line change
@@ -1,11 +1,11 @@
1
-
# Docker
1
+
# Container
2
2
3
3
## Prerequisites
4
-
* Docker must be installed and running on your system.
4
+
*A container engine, ie Docker/Podman, must be installed and running on your system.
5
5
* Create a folder to store big models & intermediate files (ex. /llama/models)
6
6
7
7
## Images
8
-
We have three Docker images available for this project:
8
+
We have three container images available for this project:
9
9
10
10
1.`ghcr.io/ggerganov/llama.cpp:full`: This image includes both the main executable file and the tools to convert LLaMA models into ggml and convert into 4-bit quantization. (platforms: `linux/amd64`, `linux/arm64`)
11
11
2.`ghcr.io/ggerganov/llama.cpp:light`: This image only includes the main executable file. (platforms: `linux/amd64`, `linux/arm64`)
@@ -27,44 +27,73 @@ The GPU enabled images are not currently tested by CI beyond being built. They a
27
27
28
28
## Usage
29
29
30
-
The easiest way to download the models, convert them to ggml and optimize them is with the --all-in-one command which includes the full docker image.
30
+
The easiest way to download the models, convert them to ggml and optimize them is with the --all-in-one command which includes the full container image.
31
31
32
32
Replace `/path/to/models` below with the actual path where you downloaded the models.
33
33
34
34
```bash
35
35
docker run -v /path/to/models:/models ghcr.io/ggerganov/llama.cpp:full --all-in-one "/models/" 7B
36
36
```
37
+
or
38
+
39
+
```bash
40
+
podman run --security-opt label=disable -v /path/to/models:/models ghcr.io/ggerganov/llama.cpp:full --all-in-one "/models/" 7B
41
+
```
37
42
38
43
On completion, you are ready to play!
39
44
40
45
```bash
41
46
docker run -v /path/to/models:/models ghcr.io/ggerganov/llama.cpp:full --run -m /models/7B/ggml-model-q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 512
42
47
```
43
48
49
+
```bash
50
+
podman run --security-opt label=disable -v /path/to/models:/models ghcr.io/ggerganov/llama.cpp:full --run -m /models/7B/ggml-model-q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 512
51
+
```
52
+
44
53
or with a light image:
45
54
46
55
```bash
47
56
docker run -v /path/to/models:/models ghcr.io/ggerganov/llama.cpp:light -m /models/7B/ggml-model-q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 512
48
57
```
49
58
59
+
or
60
+
61
+
```bash
62
+
podman run --security-opt label=disable -v /path/to/models:/models ghcr.io/ggerganov/llama.cpp:light -m /models/7B/ggml-model-q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 512
Assuming one has the [nvidia-container-toolkit](https://github.com/NVIDIA/nvidia-container-toolkit) properly installed on Linux, or is using a GPU enabled cloud, `cuBLAS` should be accessible inside the container.
59
80
60
-
## Building Docker locally
81
+
## Building Container locally
61
82
62
83
```bash
63
84
docker build -t local/llama.cpp:full-cuda --target full -f .devops/cuda.Dockerfile .
podman build -t local/llama.cpp:server-cuda --target server -f .devops/cuda.Dockerfile .
95
+
```
96
+
68
97
You may want to pass in some different `ARGS`, depending on the CUDA environment supported by your container host, as well as the GPU architecture.
69
98
70
99
The defaults are:
@@ -88,17 +117,32 @@ docker run --gpus all -v /path/to/models:/models local/llama.cpp:light-cuda -m /
88
117
docker run --gpus all -v /path/to/models:/models local/llama.cpp:server-cuda -m /models/7B/ggml-model-q4_0.gguf --port 8000 --host 0.0.0.0 -n 512 --n-gpu-layers 1
89
118
```
90
119
91
-
## Docker With MUSA
120
+
or
121
+
122
+
```bash
123
+
podman run --security-opt label=disable --gpus all -v /path/to/models:/models local/llama.cpp:full-cuda --run -m /models/7B/ggml-model-q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 512 --n-gpu-layers 1
124
+
podman run --security-opt label=disable --gpus all -v /path/to/models:/models local/llama.cpp:light-cuda -m /models/7B/ggml-model-q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 512 --n-gpu-layers 1
125
+
podman run --security-opt label=disable --gpus all -v /path/to/models:/models local/llama.cpp:server-cuda -m /models/7B/ggml-model-q4_0.gguf --port 8000 --host 0.0.0.0 -n 512 --n-gpu-layers 1
126
+
```
127
+
128
+
## Container engines With MUSA
92
129
93
130
Assuming one has the [mt-container-toolkit](https://developer.mthreads.com/musa/native) properly installed on Linux, `muBLAS` should be accessible inside the container.
94
131
95
-
## Building Docker locally
132
+
## Building Container images locally
96
133
97
134
```bash
98
135
docker build -t local/llama.cpp:full-musa --target full -f .devops/musa.Dockerfile .
0 commit comments