Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Installation]: I want to install with CPU follow the guide on windows (wsl2, ubuntu) but the wsl crash. #8493

Closed
1 task done
nicholasvan opened this issue Sep 15, 2024 · 3 comments
Labels
installation Installation problems

Comments

@nicholasvan
Copy link

Your current environment

Hello, I apologize for the interruption. I am a newcomer and currently following a tutorial for installation. I have encountered some issues, and I have looked into some other people's questions, but I haven't found any that resolve my problem. I hope I can receive some assistance.

I was following the guide to install vllm with CPU(because my gpu only has 8G ram, not sufficient to run llama3.1 8b),and after cmake,I run this command in wsl ubuntu:

VLLM_TARGET_DEVICE=cpu python setup.py install

and finally the wsl crashed, exit to the powershell.

(myenv) wangkun@DESKTOP-5HLONVC:/mnt/c/Users/wangk/ubuntu/vllm$ VLLM_TARGET_DEVICE=cpu python setup.py install
running install
/mnt/c/Users/wangk/ubuntu/myenv/lib/python3.12/site-packages/setuptools/_distutils/cmd.py:66: SetuptoolsDeprecationWarning: setup.py install is deprecated.
!!

        ********************************************************************************
        Please avoid running ``setup.py`` directly.
        Instead, use pypa/build, pypa/installer or other
        standards-based tools.

        See https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html for details.
        ********************************************************************************

!!
  self.initialize_options()
/mnt/c/Users/wangk/ubuntu/myenv/lib/python3.12/site-packages/setuptools/_distutils/cmd.py:66: EasyInstallDeprecationWarning: easy_install command is deprecated.
!!

        ********************************************************************************
        Please avoid running ``setup.py`` and ``easy_install``.
        Instead, use pypa/build, pypa/installer or other
        standards-based tools.

        See https://github.com/pypa/setuptools/issues/917 for details.
        ********************************************************************************

!!
  self.initialize_options()
running bdist_egg
running egg_info
writing vllm.egg-info/PKG-INFO
writing dependency_links to vllm.egg-info/dependency_links.txt
writing entry points to vllm.egg-info/entry_points.txt
writing requirements to vllm.egg-info/requires.txt
writing top-level names to vllm.egg-info/top_level.txt
reading manifest file 'vllm.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
adding license file 'LICENSE'
writing manifest file 'vllm.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_py
copying vllm/block.py -> build/lib.linux-x86_64-cpython-312/vllm
copying vllm/commit_id.py -> build/lib.linux-x86_64-cpython-312/vllm
copying vllm/config.py -> build/lib.linux-x86_64-cpython-312/vllm
copying vllm/connections.py -> build/lib.linux-x86_64-cpython-312/vllm
copying vllm/envs.py -> build/lib.linux-x86_64-cpython-312/vllm
copying vllm/logger.py -> build/lib.linux-x86_64-cpython-312/vllm
copying vllm/outputs.py -> build/lib.linux-x86_64-cpython-312/vllm
copying vllm/pooling_params.py -> build/lib.linux-x86_64-cpython-312/vllm
copying vllm/sampling_params.py -> build/lib.linux-x86_64-cpython-312/vllm
copying vllm/scalar_type.py -> build/lib.linux-x86_64-cpython-312/vllm
copying vllm/scripts.py -> build/lib.linux-x86_64-cpython-312/vllm
copying vllm/sequence.py -> build/lib.linux-x86_64-cpython-312/vllm
copying vllm/tracing.py -> build/lib.linux-x86_64-cpython-312/vllm
copying vllm/utils.py -> build/lib.linux-x86_64-cpython-312/vllm
copying vllm/version.py -> build/lib.linux-x86_64-cpython-312/vllm
copying vllm/_core_ext.py -> build/lib.linux-x86_64-cpython-312/vllm
copying vllm/_custom_ops.py -> build/lib.linux-x86_64-cpython-312/vllm
copying vllm/_ipex_ops.py -> build/lib.linux-x86_64-cpython-312/vllm
copying vllm/__init__.py -> build/lib.linux-x86_64-cpython-312/vllm
312/vllm/model_executor/layers/quantization
copying vllm/model_executor/layers/quantization/marlin.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization
copying vllm/model_executor/layers/quantization/modelopt.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization
copying vllm/model_executor/layers/quantization/neuron_quant.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization
copying vllm/model_executor/layers/quantization/qqq.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization
copying vllm/model_executor/layers/quantization/schema.py -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/quantization
... and many other copying ...
copying vllm/model_executor/layers/fused_moe/configs/E=8,N=8192,device_name=NVIDIA_H100_80GB_HBM3,dtype=fp8_w8a8.json -> build/lib.linux-x86_64-cpython-312/vllm/model_executor/layers/fused_moe/configs
running build_ext
-- Build type: RelWithDebInfo
-- Target device: cpu
-- Found python matching: /mnt/c/Users/wangk/ubuntu/myenv/bin/python.
CMake Warning at /mnt/c/Users/wangk/ubuntu/myenv/lib/python3.12/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:22 (message):
  static library kineto_LIBRARY-NOTFOUND not found.
Call Stack (most recent call first):
  /mnt/c/Users/wangk/ubuntu/myenv/lib/python3.12/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:120 (append_torchlib_if_found)
  CMakeLists.txt:70 (find_package)


-- Enabling core extension.
CMake Warning at cmake/cpu_extension.cmake:73 (message):
  vLLM CPU backend using AVX2 ISA
Call Stack (most recent call first):
  CMakeLists.txt:111 (include)


-- CPU extension compile flags: -fopenmp;-DVLLM_CPU_EXTENSION;-mavx2
-- Enabling C extension.
-- Configuring done (10.0s)
-- Generating done (0.1s)
-- Build files have been written to: /mnt/c/Users/wangk/ubuntu/vllm/build/temp.linux-x86_64-cpython-312
[0/8] Building CXX object CMakeFiles/_C.dir/csrc/cpu/torch_bindings.cpp.o
ninja: build stopped: interrupted by user.
Terminated
(myenv) wangkun@DESKTOP-5HLONVC:/mnt/c/Users/wangk/ubuntu/vllm$
PS C:\Users\wangk>
The output of `python collect_env.py`

Collecting environment information...
INFO 09-15 15:12:35 importing.py:10] Triton not installed; certain GPU-related functions will not be available.
WARNING 09-15 15:12:35 _custom_ops.py:18] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
PyTorch version: 2.4.0+cpu
Is debug build: False
CUDA used to build PyTorch: Could not collect
ROCM used to build PyTorch: N/A

OS: Ubuntu 24.04.1 LTS (x86_64)
GCC version: (Ubuntu 12.3.0-17ubuntu1) 12.3.0
Clang version: Could not collect
CMake version: version 3.28.3
Libc version: glibc-2.39

Python version: 3.12.3 (main, Jul 31 2024, 17:43:48) [GCC 13.2.0] (64-bit runtime)
Python platform: Linux-5.15.153.1-microsoft-standard-WSL2-x86_64-with-glibc2.39
Is CUDA available: False
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 2080
Nvidia driver version: 560.70
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 48 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 12
On-line CPU(s) list: 0-11
Vendor ID: AuthenticAMD
Model name: AMD Ryzen 5 5600 6-Core Processor
CPU family: 25
Model: 33
Thread(s) per core: 2
Core(s) per socket: 6
Socket(s): 1
Stepping: 2
BogoMIPS: 8150.08
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl tsc_reliable nonstop_tsc cpuid extd_apicid pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy svm cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext perfctr_core ssbd ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves clzero xsaveerptr arat npt nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold v_vmsave_vmload umip vaes vpclmulqdq rdpid fsrm
Virtualization: AMD-V
Hypervisor vendor: Microsoft
Virtualization type: full
L1d cache: 192 KiB (6 instances)
L1i cache: 192 KiB (6 instances)
L2 cache: 3 MiB (6 instances)
L3 cache: 32 MiB (1 instance)
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Retbleed: Not affected
Vulnerability Spec rstack overflow: Mitigation; safe RET, no microcode
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Retpolines, IBPB conditional, IBRS_FW, STIBP always-on, RSB filling, PBRSB-eIBRS Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected

Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] pyzmq==26.2.0
[pip3] torch==2.4.0+cpu
[pip3] torchvision==0.19.0+cpu
[pip3] transformers==4.44.2
[conda] Could not collect
ROCM Version: Could not collect
Neuron SDK Version: N/A
vLLM Version: 0.6.1.post2@3724d5f6b59d9859e5b47c047535bb8edc124eab
vLLM Build Flags:
CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled
GPU Topology:
GPU0 CPU Affinity NUMA Affinity GPU NUMA ID
GPU0 X N/A

Legend:

X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks

How you are installing vllm

VLLM_TARGET_DEVICE=cpu python setup.py install

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
@nicholasvan nicholasvan added the installation Installation problems label Sep 15, 2024
@Isotr0py
Copy link
Collaborator

Isotr0py commented Sep 15, 2024

Seems that you are running out of memory, because WSL2 will only use 50% available memory of your device by default.

You can limit the number of compilation jobs like export MAX_JOBS=2. Or increase the memory of WSL in .wslconfig: wslconfig

@nicholasvan
Copy link
Author

thanks very much, it works!

@youkaichao
Copy link
Member

added the lesson in #8550

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
installation Installation problems
Projects
None yet
Development

No branches or pull requests

3 participants