@@ -52,13 +52,53 @@ for output in outputs:
52
52
print (f " Prompt: { prompt!r } , Generated text: { generated_text!r } " )
53
53
```
54
54
## OpenAI格式后端部署
55
- ```
55
+ ### 服务端部署API
56
+ 在服务器上执行以下命令,启动模型服务:
57
+ ``` shell
56
58
python -m vllm.entrypoints.openai.api_server \
57
59
--model /path/to/your_model \
58
- --served-model-name llama3-cn \
59
- --max-model-len=1024
60
+ --served-model-name " llama3-cn" \
61
+ --max-model-len=1024 \
62
+ --api-key=" xxx-abc-123"
63
+ ```
64
+
65
+ 说明:其中--api-key是指定一个给连接用的token密钥,--max-model-len是最大模型单次生成长度,默认会读取模型tokenizer_config.json中自带的对话模板,你也可以通过--chat-template自行指定一个模板。(需要写为.jinja文件)
66
+ ### 客户端测试API
67
+ 终端shell:
68
+ ``` shell
69
+ curl http://服务器ip:端口/v1/chat/completions \
70
+ -H ' Content-Type: application/json' \
71
+ -H ' Accept: application/json' \
72
+ -H ' Authorization: Bearer xxx-abc-123' \
73
+ -d ' {
74
+ "model": "llama3-cn",
75
+ "messages": [
76
+ {"role": "system", "content": "You are a helpful assistant."},
77
+ {"role": "user", "content": "讲个笑话"}
78
+ ]
79
+ }'
60
80
```
61
- 支持的全部参数列表如下,可按需自行调整:
81
+ python代码:
82
+ ``` python
83
+ from openai import OpenAI
84
+
85
+ client = OpenAI(base_url = " http://服务器ip:端口/v1" , api_key = " xxx-abc-123" )
86
+
87
+ completion = client.chat.completions.create(
88
+ model = " llama3-cn" ,
89
+ messages = [
90
+ {" role" : " system" , " content" : " You are a helpful assistant." },
91
+ {" role" : " user" , " content" : " 讲个笑话" }
92
+ ],
93
+ temperature = 0.7 ,
94
+ stop = [" <|eot_id|>" ],
95
+ )
96
+
97
+ print (completion.choices[0 ].message)
98
+ ```
99
+
100
+ 附:
101
+ vllm部署支持的全部命令参数列表如下,可按需自行调整:
62
102
63
103
| 参数 | 说明 | 默认值 |
64
104
| -----------------------| -----------------------------------------------------------------------------------------------------| ---------------|
0 commit comments