Skip to content

Commit

Permalink
add download models from www.modelscope.cn (#2830)
Browse files Browse the repository at this point in the history
Co-authored-by: mulin.lyh <mulin.lyh@taobao.com>
  • Loading branch information
liuyhwangyh and mulin.lyh authored Dec 24, 2023
1 parent 1ffdaee commit c77b0a2
Show file tree
Hide file tree
Showing 3 changed files with 26 additions and 0 deletions.
10 changes: 10 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,16 @@ LLama 2, Vicuna, Alpaca, Baize, ChatGLM, Dolly, Falcon, FastChat-T5, GPT4ALL, Gu

See a complete list of supported models and instructions to add a new model [here](docs/model_support.md).

#### Use Models from modelscope
You can use models from www.modelscope.cn, just set environment variable FASTCHAT_USE_MODELSCOPE.
```
export FASTCHAT_USE_MODELSCOPE=True
```
Example:
```
FASTCHAT_USE_MODELSCOPE=True python3 -m fastchat.serve.cli --model-path qwen/Qwen-7B-Chat --revision v1.1.9
```

#### Single GPU
The command below requires around 14GB of GPU memory for Vicuna-7B and 28GB of GPU memory for Vicuna-13B.
See the ["Not Enough Memory" section](#not-enough-memory) below if you do not have enough memory.
Expand Down
13 changes: 13 additions & 0 deletions fastchat/model/model_adapter.py
Original file line number Diff line number Diff line change
Expand Up @@ -319,6 +319,19 @@ def load_model(
if dtype is not None: # Overwrite dtype if it is provided in the arguments.
kwargs["torch_dtype"] = dtype

if os.environ.get("FASTCHAT_USE_MODELSCOPE", "False").lower() == "true":
# download model from ModelScope hub,
# lazy import so that modelscope is not required for normal use.
try:
from modelscope.hub.snapshot_download import snapshot_download

model_path = snapshot_download(model_id=model_path, revision=revision)
except ImportError as e:
warnings.warn(
"Use model from www.modelscope.cn need pip install modelscope"
)
raise e

# Load model
model, tokenizer = adapter.load_model(model_path, kwargs)

Expand Down
3 changes: 3 additions & 0 deletions fastchat/serve/model_worker.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@ def __init__(
device: str,
num_gpus: int,
max_gpu_memory: str,
revision: str = None,
dtype: Optional[torch.dtype] = None,
load_8bit: bool = False,
cpu_offloading: bool = False,
Expand Down Expand Up @@ -76,6 +77,7 @@ def __init__(
logger.info(f"Loading the model {self.model_names} on worker {worker_id} ...")
self.model, self.tokenizer = load_model(
model_path,
revision=revision,
device=device,
num_gpus=num_gpus,
max_gpu_memory=max_gpu_memory,
Expand Down Expand Up @@ -345,6 +347,7 @@ def create_model_worker():
args.model_path,
args.model_names,
args.limit_worker_concurrency,
revision=args.revision,
no_register=args.no_register,
device=args.device,
num_gpus=args.num_gpus,
Expand Down

0 comments on commit c77b0a2

Please sign in to comment.