Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finetuned Flan-T5 #434

Closed
patrafter1999 opened this issue Jul 12, 2023 · 3 comments
Closed

Finetuned Flan-T5 #434

patrafter1999 opened this issue Jul 12, 2023 · 3 comments
Labels
duplicate This issue or pull request already exists new model Requests to new models

Comments

@patrafter1999
Copy link

Hi vllm team,

I know you guys are extremely busy with many action items. vLLM is now becoming a must when you run LLM.

I plan to use a finetuned FLAN-T5 model. My question is:

  • Do you support FLAN-T5 like models?
  • How do we use finetuned models as opposed to off the shelf HF models?

Thanks a lot for your kind answers.

@WoosukKwon WoosukKwon added the new model Requests to new models label Jul 13, 2023
@WoosukKwon
Copy link
Collaborator

Hi @patrafter1999, thanks for your interest in vLLM and good question!

Do you support FLAN-T5 like models?

Currently, we do not support encoder-decoder models like T5. It's on our roadmap.

How do we use finetuned models as opposed to off the shelf HF models?

You can use the same API for the fine-tuned models saved on your local disk (if the model architecture is supported). For example, the following should work:

from vllm import LLM
llm = LLM(model="path/to/local/model")

Note that if your model is fine-tuned by LoRA, you should combine the LoRA weights into the original model weights before using vLLM. Currently, we do not natively support inference with LoRA adapters.

@patrafter1999
Copy link
Author

patrafter1999 commented Aug 25, 2023

Hi @WoosukKwon,

Thanks a lot for your kind answer. I have finetuned a Santacoder model, which is supported by vLLM. I tried the following in databricks.

!pip uninstall torch
!pip install vllm

from vllm import LLM, SamplingParams
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)
llm = LLM(model=model_dir)

And I get this error.

ImportError                               Traceback (most recent call last)
File <command-3772037547246569>:3
      1 from vllm import LLM, SamplingParams
      2 sampling_params = SamplingParams(temperature=0.8, top_p=0.95)
----> 3 llm = LLM(model=model_dir)

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-f1468507-237a-46f1-93a1-62f93684021f/lib/python3.10/site-packages/vllm/entrypoints/llm.py:66, in LLM.__init__(self, model, tokenizer, tokenizer_mode, trust_remote_code, tensor_parallel_size, dtype, seed, **kwargs)
     55     kwargs["disable_log_stats"] = True
     56 engine_args = EngineArgs(
     57     model=model,
     58     tokenizer=tokenizer,
   (...)
     64     **kwargs,
     65 )
---> 66 self.llm_engine = LLMEngine.from_engine_args(engine_args)
     67 self.request_counter = Counter()

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-f1468507-237a-46f1-93a1-62f93684021f/lib/python3.10/site-packages/vllm/engine/llm_engine.py:220, in LLMEngine.from_engine_args(cls, engine_args)
    217 distributed_init_method, placement_group = initialize_cluster(
    218     parallel_config)
    219 # Create the LLM engine.
--> 220 engine = cls(*engine_configs,
    221              distributed_init_method,
    222              placement_group,
    223              log_stats=not engine_args.disable_log_stats)
    224 return engine

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-f1468507-237a-46f1-93a1-62f93684021f/lib/python3.10/site-packages/vllm/engine/llm_engine.py:101, in LLMEngine.__init__(self, model_config, cache_config, parallel_config, scheduler_config, distributed_init_method, placement_group, log_stats)
     99     self._init_workers_ray(placement_group)
    100 else:
--> 101     self._init_workers(distributed_init_method)
    103 # Profile the memory usage and initialize the cache.
    104 self._init_cache()

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-f1468507-237a-46f1-93a1-62f93684021f/lib/python3.10/site-packages/vllm/engine/llm_engine.py:119, in LLMEngine._init_workers(self, distributed_init_method)
    116 def _init_workers(self, distributed_init_method: str):
    117     # Lazy import the Worker to avoid importing torch.cuda/xformers
    118     # before CUDA_VISIBLE_DEVICES is set in the Worker
--> 119     from vllm.worker.worker import Worker  # pylint: disable=import-outside-toplevel
    121     assert self.parallel_config.world_size == 1, (
    122         "Ray is required if parallel_config.world_size > 1.")
    124     self.workers: List[Worker] = []

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-f1468507-237a-46f1-93a1-62f93684021f/lib/python3.10/site-packages/vllm/worker/worker.py:10
      6 import torch.distributed
      8 from vllm.config import (CacheConfig, ModelConfig, ParallelConfig,
      9                          SchedulerConfig)
---> 10 from vllm.model_executor import get_model, InputMetadata, set_random_seed
     11 from vllm.model_executor.parallel_utils.parallel_state import (
     12     initialize_model_parallel)
     13 from vllm.sampling_params import SamplingParams

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-f1468507-237a-46f1-93a1-62f93684021f/lib/python3.10/site-packages/vllm/model_executor/__init__.py:2
      1 from vllm.model_executor.input_metadata import InputMetadata
----> 2 from vllm.model_executor.model_loader import get_model
      3 from vllm.model_executor.utils import set_random_seed
      5 __all__ = [
      6     "InputMetadata",
      7     "get_model",
      8     "set_random_seed",
      9 ]

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-f1468507-237a-46f1-93a1-62f93684021f/lib/python3.10/site-packages/vllm/model_executor/model_loader.py:9
      6 from transformers import PretrainedConfig
      8 from vllm.config import ModelConfig
----> 9 from vllm.model_executor.models import *  # pylint: disable=wildcard-import
     10 from vllm.model_executor.weight_utils import initialize_dummy_weights
     12 # TODO(woosuk): Lazy-load the model classes.

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-f1468507-237a-46f1-93a1-62f93684021f/lib/python3.10/site-packages/vllm/model_executor/models/__init__.py:1
----> 1 from vllm.model_executor.models.baichuan import (BaiChuanForCausalLM,
      2                                                  BaichuanForCausalLM)
      3 from vllm.model_executor.models.bloom import BloomForCausalLM
      4 from vllm.model_executor.models.falcon import FalconForCausalLM

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-f1468507-237a-46f1-93a1-62f93684021f/lib/python3.10/site-packages/vllm/model_executor/models/baichuan.py:33
     31 from vllm.sequence import SequenceOutputs
     32 from vllm.model_executor.input_metadata import InputMetadata
---> 33 from vllm.model_executor.layers.activation import SiluAndMul
     34 from vllm.model_executor.layers.layernorm import RMSNorm
     35 from vllm.model_executor.layers.attention import PagedAttentionWithRoPE, PagedAttentionWithALiBi

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-f1468507-237a-46f1-93a1-62f93684021f/lib/python3.10/site-packages/vllm/model_executor/layers/activation.py:5
      2 import torch
      3 import torch.nn as nn
----> 5 from vllm import activation_ops
      7 _ACTIVATION_REGISTRY = {
      8     "gelu": nn.GELU(),
      9     # NOTE: The following GELU functions may introduce small rounding errors.
   (...)
     13     "relu": nn.ReLU(),
     14 }
     17 def get_act_fn(act_fn: str) -> nn.Module:

ImportError: /local_disk0/.ephemeral_nfs/envs/pythonEnv-f1468507-237a-46f1-93a1-62f93684021f/lib/python3.10/site-packages/vllm/activation_ops.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c106detail19maybe_wrap_dim_slowEllb

The finetuned parameter file and tokenizer file is in that model_dir folder. What could be the cause of this error?

Thanks.

@hmellor
Copy link
Collaborator

hmellor commented Mar 8, 2024

Closing as duplicate of #187

@hmellor hmellor closed this as completed Mar 8, 2024
@hmellor hmellor added the duplicate This issue or pull request already exists label Mar 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate This issue or pull request already exists new model Requests to new models
Projects
None yet
Development

No branches or pull requests

3 participants