Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Minor fix] Include flash_attn in docker image #3254

Closed
wants to merge 1 commit into from

Conversation

tdoublep
Copy link
Member

@tdoublep tdoublep commented Mar 7, 2024

Supports #3255.

To resolve it, we just need to make sure we copy the contents of thirdparty_files into the image.

@tdoublep tdoublep changed the title Include flash_attn in docker image [Minor fix] Include flash_attn in docker image Mar 7, 2024
@WoosukKwon
Copy link
Collaborator

WoosukKwon commented Mar 7, 2024

Do we really need this fix? It seems our CI successfully builds the image and runs vLLM with the main branch.

@tdoublep
Copy link
Member Author

tdoublep commented Mar 7, 2024

@WoosukKwon Just looking at the final stage of the Dockerfile:

#################### OPENAI API SERVER ####################
# openai api server alternative
FROM vllm-base AS vllm-openai
# install additional dependencies for openai api server
RUN --mount=type=cache,target=/root/.cache/pip \
    pip install accelerate

COPY --from=build /workspace/vllm/*.so /workspace/vllm/
COPY vllm vllm

ENTRYPOINT ["python3", "-m", "vllm.entrypoints.openai.api_server"]

I can't see how thirdparty_files would end up in there, unless somehow one has already built vllm outside of the docker build context (which would cause COPY vllm vllm to pull it in). Maybe the CI is doing something like that?

@tdoublep
Copy link
Member Author

tdoublep commented Mar 7, 2024

To double-check I have re-built image with no caching:

$ git log -n1
commit 2daf23ab0cf00da157b1255faddcf0a269283d36 (HEAD -> main, vllm/main)
Author: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Date:   Thu Mar 7 01:45:50 2024 -0800

    Separate attention backends (#3005)

$ docker build . -t vllm-main --build-arg="max_jobs=40" --no-cache                                                                                   
[+] Building 618.6s (30/30) FINISHED                                                                                                                                       
 => [internal] load .dockerignore                                                                                                                                     0.0s 
 => => transferring context: 50B                                                                                                                                      0.0s 
 => [internal] load build definition from Dockerfile                                                                                                                  0.0s 
 => => transferring dockerfile: 3.67kB                                                                                                                                0.0s 
 => [internal] load metadata for docker.io/nvidia/cuda:12.1.0-runtime-ubuntu22.04                                                                                     0.4s 
 => [internal] load metadata for docker.io/nvidia/cuda:12.1.0-devel-ubuntu22.04                                                                                       0.4s 
 => [internal] load build context                                                                                                                                     0.0s 
 => => transferring context: 11.92kB                                                                                                                                  0.0s
 => CACHED [vllm-base 1/5] FROM docker.io/nvidia/cuda:12.1.0-runtime-ubuntu22.04@sha256:402700b179eb764da6d60d99fe106aa16c36874f7d7fb3e122251ff6aea8b2f7              0.0s
 => CACHED [dev 1/8] FROM docker.io/nvidia/cuda:12.1.0-devel-ubuntu22.04@sha256:e3a8f7b933e77ecee74731198a2a5483e965b585cea2660675cf4bb152237e9b                      0.0s
 => [vllm-base 2/5] RUN apt-get update -y     && apt-get install -y python3-pip                                                                                      34.5s
 => [dev 2/8] RUN apt-get update -y     && apt-get install -y python3-pip git                                                                                        18.5s
 => [dev 3/8] RUN ldconfig /usr/local/cuda-12.1/compat/                                                                                                               0.6s
 => [dev 4/8] WORKDIR /workspace                                                                                                                                      0.0s
 => [dev 5/8] COPY requirements.txt requirements.txt                                                                                                                  0.0s
 => [dev 6/8] RUN --mount=type=cache,target=/root/.cache/pip     pip install -r requirements.txt                                                                    153.0s
 => [vllm-base 3/5] WORKDIR /workspace                                                                                                                                0.0s
 => [vllm-base 4/5] COPY requirements.txt requirements.txt                                                                                                            0.0s
 => [vllm-base 5/5] RUN --mount=type=cache,target=/root/.cache/pip     pip install -r requirements.txt                                                              137.6s
 => [dev 7/8] COPY requirements-dev.txt requirements-dev.txt                                                                                                          0.0s 
 => [vllm-openai 1/3] RUN --mount=type=cache,target=/root/.cache/pip     pip install accelerate                                                                       3.0s 
 => [dev 8/8] RUN --mount=type=cache,target=/root/.cache/pip     pip install -r requirements-dev.txt                                                                 10.9s 
 => [build 1/8] COPY requirements-build.txt requirements-build.txt                                                                                                    0.0s 
 => [build 2/8] RUN --mount=type=cache,target=/root/.cache/pip     pip install -r requirements-build.txt                                                              2.2s 
 => [build 3/8] COPY csrc csrc                                                                                                                                        0.0s 
 => [build 4/8] COPY setup.py setup.py                                                                                                                                0.0s 
 => [build 5/8] COPY requirements.txt requirements.txt                                                                                                                0.0s 
 => [build 6/8] COPY pyproject.toml pyproject.toml                                                                                                                    0.0s 
 => [build 7/8] COPY vllm/__init__.py vllm/__init__.py                                                                                                                0.0s 
 => [build 8/8] RUN python3 setup.py build_ext --inplace                                                                                                            368.9s 
 => [vllm-openai 2/3] COPY --from=build /workspace/vllm/*.so /workspace/vllm/                                                                                         1.0s 
 => [vllm-openai 3/3] COPY vllm vllm                                                                                                                                  0.0s 
 => exporting to image                                                                                                                                               61.1s 
 => => exporting layers                                                                                                                                              61.1s 
 => => writing image sha256:2e2e352648123eaadfb0c4014dd1b691e880340e0505f11e78b6a2b5f0effc88                                                                          0.0s 
 => => naming to docker.io/library/vllm-main                                                                                                                          0.0s 

Then I try to import the flash attention backend inside the container:

$ docker run -it --entrypoint python3 vllm-main -c "from vllm.model_executor.layers.attention.backends.flash_attn import FlashAttentionBackend"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/workspace/vllm/model_executor/layers/attention/backends/flash_attn.py", line 5, in <module>
    from flash_attn import flash_attn_func
ModuleNotFoundError: No module named 'flash_attn'

and confirm that the thirdparty_files dir indeed does not exist in the image:

$ docker run -it --entrypoint bash vllm-main -c "ls vllm/thirdparty_files"
ls: cannot access 'vllm/thirdparty_files': No such file or directory

@tdoublep
Copy link
Member Author

tdoublep commented Mar 7, 2024

It's also possible the CI wouldn't catch this if it is running on older GPUs (e.g. V100) since the import that fails only happens if newer GPU (e.g., ampere) is detected.

@tdoublep
Copy link
Member Author

tdoublep commented Mar 8, 2024

I am closing this, since it is no longer relevant now that #3269 has removed the flash attention dependency.

@tdoublep tdoublep closed this Mar 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants