Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🚀 Dockerize llamacpp #132

Merged
merged 14 commits into from
Mar 17, 2023
Merged
24 changes: 24 additions & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
*.o
*.a
.cache/
.vs/
.vscode/
.DS_Store

build/
build-em/
build-debug/
build-release/
build-static/
build-no-accel/
build-sanitize-addr/
build-sanitize-thread/
Comment on lines +9 to +15
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
build-em/
build-debug/
build-release/
build-static/
build-no-accel/
build-sanitize-addr/
build-sanitize-thread/
build-*/


models/*

/main
/quantize

arm_neon.h
compile_commands.json
Dockerfile
17 changes: 17 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
ARG UBUNTU_VERSION=22.04

FROM ubuntu:$UBUNTU_VERSION

RUN apt-get update && \
apt-get install -y build-essential python3 python3-pip

RUN pip install --upgrade pip setuptools wheel \
&& pip install torch torchvision torchaudio sentencepiece numpy
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the Python install is only needed to convert the model files, maybe that could be moved to a different Dockerfile so this one can stay smaller? (I can also see it making sense to keep things simple and only have one Dockerfile for everything though)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your report.

Splitted into two stages (build & runtime)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell the Makefile doesn’t even use Python: https://github.com/ggerganov/llama.cpp/blob/master/Makefile

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I didn't understand you the first time.

I have separated into two Dockerfiles, one for the "tools" and another for the main.

To test:

  1. build new images:
    tools:
    docker build -f .devops/tools.Dockerfile -t llamacpp-converter .
    main:
    docker build -f .devops/main.Dockerfile -t llamacpp-main .

  2. usage:
    convert model 7B pth into ggml
    docker run -v models:/models llamacpp-converter "/models/7B/" 1
    execute main process:
    docker run -v models:/models llamacpp-main -m /models/7B/ggml-model-q4_0.bin -p "Building a website can be done in 10 simple steps:" -t 8 -n 512


WORKDIR /app

COPY . .

RUN make

ENTRYPOINT [ "/app/main" ]