Skip to content

vLLM Release 0.3.0 fails to install on AMD Instinct MI300X with ROCm 6.0.2 #2865

@kannan-scalers-ai

Description

@kannan-scalers-ai

Reproducing steps:

  1. Clone the vllm repo and switch to tag v0.3.0
  2. Build the Dockerfile.rocm dockerfile with instructions from Option 3: Build from source with docker -Installation with ROCm
    • The build arguments are kept default for first test.

The build fails with installing vllm.

build command:

docker build  -f Dockerfile.rocm -t vllm-rocm .

The error below

...
 conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
18.57 WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:
18.57     PyTorch 2.1.1+cu121 with CUDA 1201 (you have 2.1.1+git011de5c)
18.57     Python  3.9.18 (you have 3.9.18)
18.57   Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
18.57   Memory-efficient attention, SwiGLU, sparse and more won't be available.
18.57   Set XFORMERS_MORE_DETAILS=1 for more details
20.52 WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:
20.52     PyTorch 2.1.1+cu121 with CUDA 1201 (you have 2.1.1+git011de5c)
20.52     Python  3.9.18 (you have 3.9.18)
20.52   Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
20.52   Memory-efficient attention, SwiGLU, sparse and more won't be available.
20.52   Set XFORMERS_MORE_DETAILS=1 for more details
22.58 WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:
22.58     PyTorch 2.1.1+cu121 with CUDA 1201 (you have 2.1.1+git011de5c)
22.58     Python  3.9.18 (you have 3.9.18)
22.58   Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
22.58   Memory-efficient attention, SwiGLU, sparse and more won't be available.
22.58   Set XFORMERS_MORE_DETAILS=1 for more details
23.37 XFORMERS_FMHA_FLASH_PATH = /opt/conda/envs/py_3.9/lib/python3.9/site-packages/xformers/ops/fmha/flash.py
23.37 XFORMERS_FMHA_COMMON_PATH = /opt/conda/envs/py_3.9/lib/python3.9/site-packages/xformers/ops/fmha/common.py
23.37 6 out of 6 hunks FAILED
23.37 Applying patch to /opt/conda/envs/py_3.9/lib/python3.9/site-packages/xformers/ops/fmha/flash.py
23.37 patching file /opt/conda/envs/py_3.9/lib/python3.9/site-packages/xformers/ops/fmha/flash.py
23.37 Successfully patch /opt/conda/envs/py_3.9/lib/python3.9/site-packages/xformers/ops/fmha/flash.py
23.37 1 out of 1 hunk FAILED
23.37 Applying patch to /opt/conda/envs/py_3.9/lib/python3.9/site-packages/xformers/ops/fmha/common.py
23.37 patching file /opt/conda/envs/py_3.9/lib/python3.9/site-packages/xformers/ops/fmha/common.py
23.37 Successfully patch /opt/conda/envs/py_3.9/lib/python3.9/site-packages/xformers/ops/fmha/common.py
25.15 No CUDA runtime is found, using CUDA_HOME='/usr'
25.17 Traceback (most recent call last):
25.17   File "/app/vllm/setup.py", line 295, in <module>
25.17     raise RuntimeError(
25.17 RuntimeError: Only the following arch is supported: {'gfx908', 'gfx90a', 'gfx1100', 'gfx906', 'gfx1030'}amdgpu_arch_found: gfx941
------
Dockerfile.rocm:78
--------------------
  77 |
  78 | >>> RUN cd /app \
  79 | >>>     && cd vllm \
  80 | >>>     && pip install -U -r requirements-rocm.txt \
  81 | >>>     && bash patch_xformers.rocm.sh \
  82 | >>>     && python3 setup.py install \
  83 | >>>     && cd ..
  84 |
--------------------
ERROR: failed to solve: process "/bin/sh -c cd /app     && cd vllm     && pip install -U -r requirements-rocm.txt     && bash patch_xformers.rocm.sh     && python3 setup.py install     && cd .." did not complete successfully: exit code: 1

Issues:

  1. The setup.py for tag 0.3.0 doesn't seems to be support for the gfx941 or gfx942 (MI300 series)
  2. The build argument on Option 3: Build from source with docker -Installation with ROCm FX_GFX_ARCHS seems to be ignored on the Dockerfile.rocm. Instead the dockerfile expects FA_GFX_ARCHS

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions