-
Notifications
You must be signed in to change notification settings - Fork 10.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🚀 Dockerize llamacpp #132
🚀 Dockerize llamacpp #132
Conversation
build-em/ | ||
build-debug/ | ||
build-release/ | ||
build-static/ | ||
build-no-accel/ | ||
build-sanitize-addr/ | ||
build-sanitize-thread/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
build-em/ | |
build-debug/ | |
build-release/ | |
build-static/ | |
build-no-accel/ | |
build-sanitize-addr/ | |
build-sanitize-thread/ | |
build-*/ |
Dockerfile
Outdated
RUN apt-get update && \ | ||
apt-get install -y build-essential python3 python3-pip | ||
|
||
RUN pip install --upgrade pip setuptools wheel \ | ||
&& pip install torch torchvision torchaudio sentencepiece numpy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the Python install is only needed to convert the model files, maybe that could be moved to a different Dockerfile so this one can stay smaller? (I can also see it making sense to keep things simple and only have one Dockerfile for everything though)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your report.
Splitted into two stages (build & runtime)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can tell the Makefile doesn’t even use Python: https://github.com/ggerganov/llama.cpp/blob/master/Makefile
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I didn't understand you the first time.
I have separated into two Dockerfiles, one for the "tools" and another for the main.
To test:
-
build new images:
tools:
docker build -f .devops/tools.Dockerfile -t llamacpp-converter .
main:
docker build -f .devops/main.Dockerfile -t llamacpp-main .
-
usage:
convert model 7B pth into ggml
docker run -v models:/models llamacpp-converter "/models/7B/" 1
execute main process:
docker run -v models:/models llamacpp-main -m /models/7B/ggml-model-q4_0.bin -p "Building a website can be done in 10 simple steps:" -t 8 -n 512
Something weird here. What am I doing wrong?
|
I believe you’d need to run |
I think you don't have the volume mounted correctly. You have to think that when you run the container, you are doing it in isolation, that is, you do not have access to the files on your host. To do this you need to expose the files through a volume. I detail it below:
|
That works. Thx. |
Where's the quantization step occurring? Logically this should occur in the However, there is discussion about adding 8-bit quantization, so really it might be better to first call the wrapper script with a param say
EDIT: Issue #106 indicates that passing additional params to |
Done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Smoke tested and looks to be working:
Convert 7B model from .pth to int4:
$ docker build -f .devops/tools.Dockerfile -t llamacpp-tools .
$ docker run -v $(pwd)/models:/models llamacpp-tools --convert "/models/7B/" 1
$ docker run -v $(pwd)/models:/models llamacpp-tools --quantize "/models/7B/ggml-model-f16.bin" "/models/7B/ggml-model-q4_0.bin" 2
Run 7B model:
$ docker build -f .devops/main.Dockerfile -t llamacpp-main .
$ docker run -v $(pwd)/models:/models llamacpp-main -m /models/7B/ggml-model-q4_0.bin -p "Building a website can be done in 10 simple steps:" -t 8 -n 512
Great job. Just a suggestion: What about adding the @gjmulder build instructions to the README? |
We can add the instructions to compile the image locally. However, the simplest thing would be to publish the docker image in "dockerhub" and it would not really be necessary to clone repositories or anything similar, just have Docker Engine or Docker Desktop installed.
|
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this!
I'm not familiar with docker pipeline - is there anything I need to do for these images to be available, or we just merge this like this?
I can do for you, I'll commit here before merge pr |
Hi @ggerganov I've created a new github action, which is only fired when a master is pushed. If you look at the file, I have published the image in my account to test locally. I would recommend you to create a dockerhub account (if you don't already have one) and create a new repository with the name you want (eg llamacpp) If you register with the user ggerganov, then we can publish the image: ggerganov/llamacpp Once you're registered, you can generate the token for the github action to have access to. You should put both the user and the token in the github secrets: -DOCKERHUB_USERNAME Before closing the PR, we must change the name of the image that we have in the pipeline yaml. For those who want to try it, you can do it with the images that I have published in my account: ex:
full version (3.61GB):
|
Another option could be to use the GitHub registry which wouldn’t need any additional setup beyond pointing the builder to the right image name. |
Yep, in any case, it will adapt the yaml to the registry configuration. |
Does this mean I don't have to create dockerhub account? |
Short note here, since I am running Docker Desktop on Windows, I needed to change the ${pwd} to %cd%
Thanks for the great work!! |
In light of the recent Docker policy changes, I would recommend to push to ghcr.io instead. See how to login to ghcr.io here: https://github.com/docker/login-action#github-container-registry In addition, adding a README section on running with Docker would be useful. |
yes, make sense, tonight I can do it |
.github/workflows/docker.yml
Outdated
@@ -38,13 +38,14 @@ jobs: | |||
- name: Log in to Docker Hub | |||
uses: docker/login-action@v2 | |||
with: | |||
username: ${{ secrets.DOCKERHUB_USERNAME }} | |||
password: ${{ secrets.DOCKERHUB_TOKEN }} | |||
registry: docker.pkg.github.com |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be ghcr.io
- the docker.pkg.github.com
is legacy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, now the pipeline seems to work correctly. The flow that I have defined is the following:
Light version (only includes the main)
Full version (includes python, main and quantize scripts)
Versioned imagesOn the other hand, the image will also be pushed but versioned by the commit hash.
If you have any suggestions, welcome! Thank you |
@ggerganov can i do squash? |
Good morning! I've included a couple of new commands in the bash tools:
On the other hand, I have updated the README.md file explaining how to start using the Docker image. DockerPrerequisites
ImagesWe have two Docker images available for this project:
UsageThe easiest way to download the models, convert them to ggml and optimize them is with the --all-in-one command which includes the full docker image. docker run -v /llama/models:/models ghcr.io/ggerganov/llama.cpp:full --all-in-one "/models/" 7B On complete, you are ready to play! docker run -v /llama/models:/models ghcr.io/ggerganov/llama.cpp:full --run -m /models/7B/ggml-model-q4_0.bin -p "Building a website can be done in 10 simple steps:" -t 8 -n 512 or with light image: docker run -v /llama/models:/models ghcr.io/ggerganov/llama.cpp:light -m /models/7B/ggml-model-q4_0.bin -p "Building a website can be done in 10 simple steps:" -t 8 -n 512 |
Yes, almost always squash. Sorry for slow responses - very busy week .. |
add windows build commands
First of all, thank you for the effort of the entire community. The work they do is impressive.
I'm going to try to do my bit by dockerizing this client and making it more accessible.
If you have time, I would recommend creating a pipeline to publish the image to dockerhub, so it would be easier to use, ej:
docker pull ggerganov/llamacpp
or similar.To make it work, just execute these commands:
docker build -t llamacpp .
If you want to run in interactive mode, don't forget to tell Docker that too.