fix: error due to FA2 when building #3266

AlpinDale · 2024-03-08T01:27:07Z

The #3005 PR introduced an issue where the Python env can't find pip under certain conditions. This PR uses ensurepip to bootstrap pip into the existing environment.

Resolves #3265

mgoin · 2024-03-08T01:49:30Z

@AlpinDale thank you for this! This almost worked out of the box for me but I got an error distutils.errors.DistutilsFileError: cannot copy tree '/home/michael/code/vllm/vllm/thirdparty_files': not a directory. After creating that directory with mkdir vllm/thirdparty_files, it built!

setup.py

Co-authored-by: Michael Goin <michael@neuralmagic.com>

AlpinDale · 2024-03-08T01:52:00Z

Thanks for the quick fix @mgoin

mgoin · 2024-03-08T02:11:53Z

I spoke too soon, it seems like the build succeeds but in actuality flash-attn just fails to install

Successfully installed vllm-0.3.3+cu123

CUDA_VISIBLE_DEVICES=7 python -c 'from vllm import LLM;LLM("facebook/opt-125m", tensor_parallel_size=1)'
INFO 03-08 02:10:34 llm_engine.py:88] Initializing an LLM engine (v0.3.3) with config: model='facebook/opt-125m', tokenizer='facebook/opt-125m', tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=2048, download_dir=None, load_format=auto, tensor_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, seed=0)
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/michael/code/vllm/vllm/entrypoints/llm.py", line 109, in __init__
    self.llm_engine = LLMEngine.from_engine_args(engine_args)
  File "/home/michael/code/vllm/vllm/engine/llm_engine.py", line 412, in from_engine_args
    engine = cls(*engine_configs,
  File "/home/michael/code/vllm/vllm/engine/llm_engine.py", line 142, in __init__
    self._init_workers()
  File "/home/michael/code/vllm/vllm/engine/llm_engine.py", line 200, in _init_workers
    self._run_workers("load_model")
  File "/home/michael/code/vllm/vllm/engine/llm_engine.py", line 1086, in _run_workers
    driver_worker_output = getattr(self.driver_worker,
  File "/home/michael/code/vllm/vllm/worker/worker.py", line 99, in load_model
    self.model_runner.load_model()
  File "/home/michael/code/vllm/vllm/worker/model_runner.py", line 89, in load_model
    self.model = get_model(self.model_config,
  File "/home/michael/code/vllm/vllm/model_executor/utils.py", line 52, in get_model
    return get_model_fn(model_config, device_config, **kwargs)
  File "/home/michael/code/vllm/vllm/model_executor/model_loader.py", line 79, in get_model
    model = model_class(model_config.hf_config, linear_method)
  File "/home/michael/code/vllm/vllm/model_executor/models/opt.py", line 293, in __init__
    self.model = OPTModel(config, linear_method)
  File "/home/michael/code/vllm/vllm/model_executor/models/opt.py", line 271, in __init__
    self.decoder = OPTDecoder(config, linear_method)
  File "/home/michael/code/vllm/vllm/model_executor/models/opt.py", line 234, in __init__
    self.layers = nn.ModuleList([
  File "/home/michael/code/vllm/vllm/model_executor/models/opt.py", line 235, in <listcomp>
    OPTDecoderLayer(config, linear_method)
  File "/home/michael/code/vllm/vllm/model_executor/models/opt.py", line 121, in __init__
    self.self_attn = OPTAttention(
  File "/home/michael/code/vllm/vllm/model_executor/models/opt.py", line 92, in __init__
    self.attn = Attention(self.num_heads,
  File "/home/michael/code/vllm/vllm/model_executor/layers/attention/attention.py", line 37, in __init__
    from vllm.model_executor.layers.attention.backends.flash_attn import FlashAttentionBackend
  File "/home/michael/code/vllm/vllm/model_executor/layers/attention/backends/flash_attn.py", line 5, in <module>
    from flash_attn import flash_attn_func
ModuleNotFoundError: No module named 'flash_attn'

AlpinDale · 2024-03-08T02:14:23Z

That's odd. I wonder if there's a way to specify a module should be installed without external dependencies in requirements.txt. That should be the only reason we need to do this for flash attention.

AlpinDale · 2024-03-08T02:18:04Z

Looks like there's no way to do this reliably. @WoosukKwon can we instead import the flash attention forward kernels directly in vLLM? I'm unsure why they're needed in the first place, I noticed 0 performance improvements with Flash Attention 2 in place of xFormers.

AlpinDale · 2024-03-08T17:49:37Z

Closing due to #3269 being a better solution.

AlpinDale added 2 commits March 8, 2024 01:24

fix: error due to FA2 when building

6d9b845

yapf

78ef788

mgoin reviewed Mar 8, 2024

View reviewed changes

setup.py Show resolved Hide resolved

manually create the third-party directory

6778ac2

Co-authored-by: Michael Goin <michael@neuralmagic.com>

varun-sundar-rabindranath mentioned this pull request Mar 8, 2024

Upstream sync 2024 03 07 neuralmagic/nm-vllm#103

Closed

AlpinDale closed this Mar 8, 2024

AlpinDale deleted the fix/setup-pip-error branch March 8, 2024 17:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

fix: error due to FA2 when building #3266

fix: error due to FA2 when building #3266

Uh oh!

AlpinDale commented Mar 8, 2024

Uh oh!

mgoin commented Mar 8, 2024 •

edited

Loading

Uh oh!

Uh oh!

AlpinDale commented Mar 8, 2024

Uh oh!

mgoin commented Mar 8, 2024

Uh oh!

AlpinDale commented Mar 8, 2024

Uh oh!

AlpinDale commented Mar 8, 2024

Uh oh!

AlpinDale commented Mar 8, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

fix: error due to FA2 when building #3266

fix: error due to FA2 when building #3266

Uh oh!

Conversation

AlpinDale commented Mar 8, 2024

Uh oh!

mgoin commented Mar 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

AlpinDale commented Mar 8, 2024

Uh oh!

mgoin commented Mar 8, 2024

Uh oh!

AlpinDale commented Mar 8, 2024

Uh oh!

AlpinDale commented Mar 8, 2024

Uh oh!

AlpinDale commented Mar 8, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mgoin commented Mar 8, 2024 •

edited

Loading