Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🚀 Dockerize llamacpp #132

Merged
merged 14 commits into from
Mar 17, 2023
Merged

Conversation

bernatvadell
Copy link
Contributor

@bernatvadell bernatvadell commented Mar 14, 2023

First of all, thank you for the effort of the entire community. The work they do is impressive.

I'm going to try to do my bit by dockerizing this client and making it more accessible.

If you have time, I would recommend creating a pipeline to publish the image to dockerhub, so it would be easier to use, ej: docker pull ggerganov/llamacpp or similar.

To make it work, just execute these commands:

  • Build image (atm not exists in dockerhub)
    docker build -t llamacpp .
  • Run program:
docker run -v ./models:/models llamacpp -m /models/7B/ggml-model-q4_0.bin -p "Building a website can be done in 10 simple steps:" -t 8 -n 512

If you want to run in interactive mode, don't forget to tell Docker that too.

docker run -v ./models:/models llamacpp -m /models/7B/ggml-model-q4_0.bin -t 8 -n 256 --repeat_penalty 1.0 --color -i -r "User:" \
                                           -p \
"Transcript of a dialog, where the User interacts with an Assistant named Bob. Bob is helpful, kind, honest, good at writing, and never fails to answer the User's requests immediately and with precision.

User: Hello, Bob.
Bob: Hello. How may I help you today?
User: Please tell me the largest city in Europe.
Bob: Sure. The largest city in Europe is Moscow, the capital of Russia.
User:"

Comment on lines +9 to +15
build-em/
build-debug/
build-release/
build-static/
build-no-accel/
build-sanitize-addr/
build-sanitize-thread/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
build-em/
build-debug/
build-release/
build-static/
build-no-accel/
build-sanitize-addr/
build-sanitize-thread/
build-*/

Dockerfile Outdated
Comment on lines 5 to 9
RUN apt-get update && \
apt-get install -y build-essential python3 python3-pip

RUN pip install --upgrade pip setuptools wheel \
&& pip install torch torchvision torchaudio sentencepiece numpy
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the Python install is only needed to convert the model files, maybe that could be moved to a different Dockerfile so this one can stay smaller? (I can also see it making sense to keep things simple and only have one Dockerfile for everything though)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your report.

Splitted into two stages (build & runtime)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell the Makefile doesn’t even use Python: https://github.com/ggerganov/llama.cpp/blob/master/Makefile

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I didn't understand you the first time.

I have separated into two Dockerfiles, one for the "tools" and another for the main.

To test:

  1. build new images:
    tools:
    docker build -f .devops/tools.Dockerfile -t llamacpp-converter .
    main:
    docker build -f .devops/main.Dockerfile -t llamacpp-main .

  2. usage:
    convert model 7B pth into ggml
    docker run -v models:/models llamacpp-converter "/models/7B/" 1
    execute main process:
    docker run -v models:/models llamacpp-main -m /models/7B/ggml-model-q4_0.bin -p "Building a website can be done in 10 simple steps:" -t 8 -n 512

@gjmulder
Copy link
Collaborator

Something weird here. What am I doing wrong?

$ cat /data/llama/7B/params.json;echo
{"dim": 4096, "multiple_of": 256, "n_heads": 32, "n_layers": 32, "norm_eps": 1e-06, "vocab_size": -1}
$ docker run -v models:/models llamacpp-converter "/data/llama/7B" 1
Traceback (most recent call last):
  File "/app/convert-pth-to-ggml.py", line 67, in <module>
    with open(fname_hparams, "r") as f:
FileNotFoundError: [Errno 2] No such file or directory: '/data/llama/7B/params.json'

@j-f1
Copy link
Collaborator

j-f1 commented Mar 14, 2023

I believe you’d need to run docker run -v /data/llama:/models llamacpp-converter "/models/7B" 1

@bernatvadell
Copy link
Contributor Author

bernatvadell commented Mar 14, 2023

I think you don't have the volume mounted correctly.

You have to think that when you run the container, you are doing it in isolation, that is, you do not have access to the files on your host. To do this you need to expose the files through a volume.

I detail it below:

docker run 
  # mount volume to expose my current working directory (pwd) subfolder "models" into container path /models-only-exists-in-your-container
  -v $(pwd)/models:/models-only-exists-in-your-container
  # specify which image you want to run
  llamacpp-main 
 # llamacpp's normal arguments
  -m /models-only-exists-in-your-container/7B/ggml-model-q4_0.bin 
  -p "Building a website can be done in 10 simple steps:" 
  -t 8 
  -n 512

@gjmulder
Copy link
Collaborator

I believe you’d need to run docker run -v /data/llama:/models llamacpp-converter "/models/7B" 1

That works. Thx.

@gjmulder
Copy link
Collaborator

gjmulder commented Mar 14, 2023

Where's the quantization step occurring?

Logically this should occur in the tools Dockerfile, which implies running make there too and having a wrapper script to call first convert-pth-to-ggml.py and then quantize.

However, there is discussion about adding 8-bit quantization, so really it might be better to first call the wrapper script with a param say --convert to do the conversion first, then call it again to quantize with --quantize <q4_0|q8_0> for maximum flexibility. e.g.

docker run -v models:/models llamacpp-converter --convert "/models/7B/" 1
docker run -v models:/models llamacpp-converter --quantize q4_0 "/models/7B/"

EDIT: Issue #106 indicates that passing additional params to ./quantize.sh will become necessary as well.

@bernatvadell
Copy link
Contributor Author

Where's the quantization step occurring?

Logically this should occur in the tools Dockerfile, which implies running make there too and having a wrapper script to call first convert-pth-to-ggml.py and then quantize.

However, there is discussion about adding 8-bit quantization, so really it might be better to first call the wrapper script with a param say --convert to do the conversion first, then call it again to quantize with --quantize <4bit|8bit> for maximum flexibility. e.g.

docker run -v models:/models llamacpp-converter --convert "/models/7B/" 1
docker run -v models:/models llamacpp-converter --quantize 4bit "/models/7B/"

EDIT: Issue #106 indicates that passing additional params to ./quantize.sh will become necessary as well.

Done.
docker run -v $(pwd)/models:/models llamacpp-tools --quantize "/models/7B/ggml-model-f16.bin" "/models/7B/ggml-model-q4_0.bin"
docker run -v $(pwd)/models:/models llamacpp-tools --convert "/models/7B/" 1

Copy link
Collaborator

@gjmulder gjmulder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Smoke tested and looks to be working:

Convert 7B model from .pth to int4:

$ docker build -f .devops/tools.Dockerfile -t llamacpp-tools .
$ docker run -v $(pwd)/models:/models llamacpp-tools --convert "/models/7B/" 1
$ docker run -v $(pwd)/models:/models llamacpp-tools --quantize "/models/7B/ggml-model-f16.bin" "/models/7B/ggml-model-q4_0.bin" 2

Run 7B model:

$ docker build -f .devops/main.Dockerfile -t llamacpp-main .
$ docker run -v $(pwd)/models:/models llamacpp-main -m /models/7B/ggml-model-q4_0.bin -p "Building a website can be done in 10 simple steps:" -t 8 -n 512

@borgstad
Copy link

Great job. Just a suggestion: What about adding the @gjmulder build instructions to the README?

@bernatvadell
Copy link
Contributor Author

bernatvadell commented Mar 14, 2023

Great job. Just a suggestion: What about adding the @gjmulder build instructions to the README?

We can add the instructions to compile the image locally. However, the simplest thing would be to publish the docker image in "dockerhub" and it would not really be necessary to clone repositories or anything similar, just have Docker Engine or Docker Desktop installed.

docker run -v $(pwd)/models:/models ggerganov/llamacpp-tools --convert "/models/7B/" 1
or
docker run -v $(pwd)/models:/models ggerganov/llamacpp -m /models/7B/ggml-model-q4_0.bin -p "Building a website can be done in 10 simple steps:" -t 8 -n 512

.devops/tools.sh Outdated Show resolved Hide resolved
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Copy link
Owner

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this!
I'm not familiar with docker pipeline - is there anything I need to do for these images to be available, or we just merge this like this?

@bernatvadell
Copy link
Contributor Author

I can do for you, I'll commit here before merge pr

@bernatvadell
Copy link
Contributor Author

Hi @ggerganov

I've created a new github action, which is only fired when a master is pushed.

If you look at the file, I have published the image in my account to test locally.

I would recommend you to create a dockerhub account (if you don't already have one) and create a new repository with the name you want (eg llamacpp)

If you register with the user ggerganov, then we can publish the image: ggerganov/llamacpp

Once you're registered, you can generate the token for the github action to have access to. You should put both the user and the token in the github secrets:

-DOCKERHUB_USERNAME
-DOCKERHUB_TOKEN

Before closing the PR, we must change the name of the image that we have in the pipeline yaml.

For those who want to try it, you can do it with the images that I have published in my account:

ex:
light version (only main, 28.32MB):

docker run -v $(pwd)/models:/models bernatvadell/llamacpp:latest -m /models/7B/ggml-model-q4_0.bin -p "Building a website can be done in 10 simple steps:" -t 8 -n 512

full version (3.61GB):

docker run -v $(pwd)/models:/models bernatvadell/llamacpp:full --run -m /models/7B/ggml-model-q4_0.bin -p "Building a website can be done in 10 simple steps:" -t 8 -n 512

docker run -v $(pwd)/models:/models bernatvadell/llamacpp:full --convert "/models/7B/" 1

docker run -v $(pwd)/models:/models bernatvadell/llamacpp:full --quantize "/models/7B/ggml-model-f16.bin" "/models/7B/ggml-model-q4_0.bin" 2

@j-f1
Copy link
Collaborator

j-f1 commented Mar 15, 2023

Another option could be to use the GitHub registry which wouldn’t need any additional setup beyond pointing the builder to the right image name.

@bernatvadell
Copy link
Contributor Author

bernatvadell commented Mar 15, 2023

Yep, in any case, it will adapt the yaml to the registry configuration.

@ggerganov
Copy link
Owner

Another option could be to use the GitHub registry which wouldn’t need any additional setup beyond pointing the builder to the right image name.

Does this mean I don't have to create dockerhub account?

@ggerganov ggerganov mentioned this pull request Mar 15, 2023
@Matthias-Johannes-Mack
Copy link

Matthias-Johannes-Mack commented Mar 15, 2023

Short note here, since I am running Docker Desktop on Windows, I needed to change the ${pwd} to %cd%

docker run -v %cd%/models:/models bernatvadell/llamacpp:full --run -m /models/7B/ggml-model-q4_0.bin -p "Building a website can be done in 10 simple steps:" -t 8 -n 512

docker run -v %cd%/models:/models bernatvadell/llamacpp:full --convert "/models/7B/" 1

docker run -v %cd%/models:/models bernatvadell/llamacpp:full --quantize "/models/7B/ggml-model-f16.bin" "/models/7B/ggml-model-q4_0.bin" 2

Thanks for the great work!!

@Niek
Copy link

Niek commented Mar 16, 2023

In light of the recent Docker policy changes, I would recommend to push to ghcr.io instead. See how to login to ghcr.io here: https://github.com/docker/login-action#github-container-registry

In addition, adding a README section on running with Docker would be useful.

@bernatvadell
Copy link
Contributor Author

yes, make sense, tonight I can do it

@@ -38,13 +38,14 @@ jobs:
- name: Log in to Docker Hub
uses: docker/login-action@v2
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
registry: docker.pkg.github.com
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be ghcr.io - the docker.pkg.github.com is legacy

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bernatvadell
Copy link
Contributor Author

bernatvadell commented Mar 16, 2023

Ok, now the pipeline seems to work correctly.

The flow that I have defined is the following:

  • Whenever a PR is done, the build of the image will be launched, but the push will not be done. This way we can validate that it continues to compile correctly.
  • When the push to master is done, then it will compile and push to the github registry:

Light version (only includes the main)

ghcr.io/ggerganov/llama.cpp:light

Full version (includes python, main and quantize scripts)

ghcr.io/ggerganov/llama.cpp:full

Versioned images

On the other hand, the image will also be pushed but versioned by the commit hash.

ghcr.io/ggerganov/llama.cpp:light-<commit_hash>
ghcr.io/ggerganov/llama.cpp:full-<commit_hash>

If you have any suggestions, welcome!

Thank you

@bernatvadell
Copy link
Contributor Author

@ggerganov can i do squash?

@bernatvadell
Copy link
Contributor Author

Good morning!

I've included a couple of new commands in the bash tools:

  1. New command to download the indicated model:
    --download (-d): Download original llama model from CDN: https://agi.gpt4.org/llama/
  2. I have included a command to perform an "all-in-one":
    --all-in-one (-a): Execute --download, --convert & --quantize

On the other hand, I have updated the README.md file explaining how to start using the Docker image.

Docker

Prerequisites

  • Docker must be installed and running on your system.
  • Create a folder to store big models & intermediate files (in ex. im using /llama/models)

Images

We have two Docker images available for this project:

  1. ghcr.io/ggerganov/llama.cpp:full: This image includes both the main executable file and the tools to convert LLaMA models into ggml and convert into 4-bit quantization.
  2. ghcr.io/ggerganov/llama.cpp:light: This image only includes the main executable file.

Usage

The easiest way to download the models, convert them to ggml and optimize them is with the --all-in-one command which includes the full docker image.

docker run -v /llama/models:/models ghcr.io/ggerganov/llama.cpp:full --all-in-one "/models/" 7B

On complete, you are ready to play!

docker run -v /llama/models:/models ghcr.io/ggerganov/llama.cpp:full --run -m /models/7B/ggml-model-q4_0.bin -p "Building a website can be done in 10 simple steps:" -t 8 -n 512

or with light image:

docker run -v /llama/models:/models ghcr.io/ggerganov/llama.cpp:light -m /models/7B/ggml-model-q4_0.bin -p "Building a website can be done in 10 simple steps:" -t 8 -n 512

@bernatvadell bernatvadell merged commit 2af23d3 into ggerganov:master Mar 17, 2023
@bernatvadell bernatvadell deleted the feat/dockerize branch March 17, 2023 09:47
@ggerganov
Copy link
Owner

@ggerganov can i do squash?

Yes, almost always squash.

Sorry for slow responses - very busy week ..

anzz1 referenced this pull request in anzz1/alpaca.cpp Mar 21, 2023
rooprob pushed a commit to rooprob/llama.cpp that referenced this pull request Aug 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants