Skip to content

Conversation

@AlpinDale
Copy link
Contributor

The #3005 PR introduced an issue where the Python env can't find pip under certain conditions. This PR uses ensurepip to bootstrap pip into the existing environment.

Resolves #3265

@mgoin
Copy link
Member

mgoin commented Mar 8, 2024

@AlpinDale thank you for this! This almost worked out of the box for me but I got an error distutils.errors.DistutilsFileError: cannot copy tree '/home/michael/code/vllm/vllm/thirdparty_files': not a directory. After creating that directory with mkdir vllm/thirdparty_files, it built!

Co-authored-by: Michael Goin <michael@neuralmagic.com>
@AlpinDale
Copy link
Contributor Author

Thanks for the quick fix @mgoin

@mgoin
Copy link
Member

mgoin commented Mar 8, 2024

I spoke too soon, it seems like the build succeeds but in actuality flash-attn just fails to install

Successfully installed vllm-0.3.3+cu123

CUDA_VISIBLE_DEVICES=7 python -c 'from vllm import LLM;LLM("facebook/opt-125m", tensor_parallel_size=1)'
INFO 03-08 02:10:34 llm_engine.py:88] Initializing an LLM engine (v0.3.3) with config: model='facebook/opt-125m', tokenizer='facebook/opt-125m', tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=2048, download_dir=None, load_format=auto, tensor_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, seed=0)
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/michael/code/vllm/vllm/entrypoints/llm.py", line 109, in __init__
    self.llm_engine = LLMEngine.from_engine_args(engine_args)
  File "/home/michael/code/vllm/vllm/engine/llm_engine.py", line 412, in from_engine_args
    engine = cls(*engine_configs,
  File "/home/michael/code/vllm/vllm/engine/llm_engine.py", line 142, in __init__
    self._init_workers()
  File "/home/michael/code/vllm/vllm/engine/llm_engine.py", line 200, in _init_workers
    self._run_workers("load_model")
  File "/home/michael/code/vllm/vllm/engine/llm_engine.py", line 1086, in _run_workers
    driver_worker_output = getattr(self.driver_worker,
  File "/home/michael/code/vllm/vllm/worker/worker.py", line 99, in load_model
    self.model_runner.load_model()
  File "/home/michael/code/vllm/vllm/worker/model_runner.py", line 89, in load_model
    self.model = get_model(self.model_config,
  File "/home/michael/code/vllm/vllm/model_executor/utils.py", line 52, in get_model
    return get_model_fn(model_config, device_config, **kwargs)
  File "/home/michael/code/vllm/vllm/model_executor/model_loader.py", line 79, in get_model
    model = model_class(model_config.hf_config, linear_method)
  File "/home/michael/code/vllm/vllm/model_executor/models/opt.py", line 293, in __init__
    self.model = OPTModel(config, linear_method)
  File "/home/michael/code/vllm/vllm/model_executor/models/opt.py", line 271, in __init__
    self.decoder = OPTDecoder(config, linear_method)
  File "/home/michael/code/vllm/vllm/model_executor/models/opt.py", line 234, in __init__
    self.layers = nn.ModuleList([
  File "/home/michael/code/vllm/vllm/model_executor/models/opt.py", line 235, in <listcomp>
    OPTDecoderLayer(config, linear_method)
  File "/home/michael/code/vllm/vllm/model_executor/models/opt.py", line 121, in __init__
    self.self_attn = OPTAttention(
  File "/home/michael/code/vllm/vllm/model_executor/models/opt.py", line 92, in __init__
    self.attn = Attention(self.num_heads,
  File "/home/michael/code/vllm/vllm/model_executor/layers/attention/attention.py", line 37, in __init__
    from vllm.model_executor.layers.attention.backends.flash_attn import FlashAttentionBackend
  File "/home/michael/code/vllm/vllm/model_executor/layers/attention/backends/flash_attn.py", line 5, in <module>
    from flash_attn import flash_attn_func
ModuleNotFoundError: No module named 'flash_attn'

@AlpinDale
Copy link
Contributor Author

That's odd. I wonder if there's a way to specify a module should be installed without external dependencies in requirements.txt. That should be the only reason we need to do this for flash attention.

@AlpinDale
Copy link
Contributor Author

Looks like there's no way to do this reliably. @WoosukKwon can we instead import the flash attention forward kernels directly in vLLM? I'm unsure why they're needed in the first place, I noticed 0 performance improvements with Flash Attention 2 in place of xFormers.

@AlpinDale
Copy link
Contributor Author

Closing due to #3269 being a better solution.

@AlpinDale AlpinDale closed this Mar 8, 2024
@AlpinDale AlpinDale deleted the fix/setup-pip-error branch March 8, 2024 17:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Issues with installing from source due to flash-attn subprocess install

2 participants