Skip to content

Commit a6a7b11

Browse files
author
xusenlin
committed
Add scripts for vllm server
1 parent 1293a64 commit a6a7b11

File tree

1 file changed

+41
-2
lines changed

1 file changed

+41
-2
lines changed

docs/VLLM_SCRIPT.md

Lines changed: 41 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ docker build -f docker/Dockerfile.vllm -t llm-api:vllm .
2222
+ `tokenizer-mode`(可选项): `tokenizer` 的模式,默认为 `auto`
2323

2424

25-
+ `tensor_parallel_size`(可选项): `GPU` 数量,默认为 `1`
25+
+ `tensor-parallel-size`(可选项): `GPU` 数量,默认为 `1`
2626

2727

2828
+ `embedding_name`(可选项): 嵌入模型的文件所在路径,推荐使用 `moka-ai/m3e-base` 或者 `BAAI/bge-large-zh`
@@ -47,5 +47,44 @@ docker run -it -d --gpus all --ipc=host --net=host -p 80:80 --name=qwen \
4747
--model_name qwen \
4848
--model Qwen/Qwen-7B-Chat \
4949
--trust-remote-code \
50-
--tokenizer-mode slow
50+
--tokenizer-mode slow \
51+
--dtype half
52+
```
53+
54+
### InternLM
55+
56+
internlm-chat-7b:
57+
58+
```shell
59+
docker run -it -d --gpus all --ipc=host --net=host -p 80:80 --name=internlm \
60+
--ulimit memlock=-1 --ulimit stack=67108864 \
61+
-v `pwd`:/workspace \
62+
llm-api:vllm \
63+
python api/vllm_server.py \
64+
--port 80 \
65+
--allow-credentials \
66+
--model_name internlm \
67+
--model internlm/internlm-chat-7b \
68+
--trust-remote-code \
69+
--tokenizer-mode slow \
70+
--dtype half
71+
```
72+
73+
### Baichuan-13b-chat
74+
75+
baichuan-inc/Baichuan-13B-Chat:
76+
77+
```shell
78+
docker run -it -d --gpus all --ipc=host --net=host -p 80:80 --name=baichuan-13b-chat \
79+
--ulimit memlock=-1 --ulimit stack=67108864 \
80+
-v `pwd`:/workspace \
81+
llm-api:vllm \
82+
python api/vllm_server.py \
83+
--port 80 \
84+
--allow-credentials \
85+
--model_name baichuan-13b-chat \
86+
--model baichuan-inc/Baichuan-13B-Chat \
87+
--trust-remote-code \
88+
--tokenizer-mode slow \
89+
--dtype half
5190
```

0 commit comments

Comments
 (0)