Skip to content

Commit dba68f9

Browse files
authored
[Doc] Unify structured outputs examples (#18196)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
1 parent a3319f4 commit dba68f9

8 files changed

+397
-491
lines changed

docs/features/reasoning_outputs.md

Lines changed: 0 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -142,51 +142,6 @@ for chunk in stream:
142142

143143
Remember to check whether the `reasoning_content` exists in the response before accessing it. You could checkout the [example](https://github.com/vllm-project/vllm/blob/main/examples/online_serving/openai_chat_completion_with_reasoning_streaming.py).
144144

145-
## Structured output
146-
147-
The reasoning content is also available in the structured output. The structured output engine like `xgrammar` will use the reasoning content to generate structured output. It is only supported in v0 engine now.
148-
149-
```bash
150-
vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B --reasoning-parser deepseek_r1
151-
```
152-
153-
The following is an example client:
154-
155-
```python
156-
from openai import OpenAI
157-
from pydantic import BaseModel
158-
159-
# Modify OpenAI's API key and API base to use vLLM's API server.
160-
openai_api_key = "EMPTY"
161-
openai_api_base = "http://localhost:8000/v1"
162-
163-
client = OpenAI(
164-
api_key=openai_api_key,
165-
base_url=openai_api_base,
166-
)
167-
168-
models = client.models.list()
169-
model = models.data[0].id
170-
171-
class People(BaseModel):
172-
name: str
173-
age: int
174-
175-
json_schema = People.model_json_schema()
176-
177-
prompt = ("Generate a JSON with the name and age of one random person.")
178-
completion = client.chat.completions.create(
179-
model=model,
180-
messages=[{
181-
"role": "user",
182-
"content": prompt,
183-
}],
184-
extra_body={"guided_json": json_schema},
185-
)
186-
print("reasoning_content: ", completion.choices[0].message.reasoning_content)
187-
print("content: ", completion.choices[0].message.content)
188-
```
189-
190145
## Tool Calling
191146

192147
The reasoning content is also available when both tool calling and the reasoning parser are enabled. Additionally, tool calling only parses functions from the `content` field, not from the `reasoning_content`.

docs/features/structured_outputs.md

Lines changed: 63 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -39,9 +39,10 @@ client = OpenAI(
3939
base_url="http://localhost:8000/v1",
4040
api_key="-",
4141
)
42+
model = client.models.list().data[0].id
4243

4344
completion = client.chat.completions.create(
44-
model="Qwen/Qwen2.5-3B-Instruct",
45+
model=model,
4546
messages=[
4647
{"role": "user", "content": "Classify this sentiment: vLLM is wonderful!"}
4748
],
@@ -54,7 +55,7 @@ The next example shows how to use the `guided_regex`. The idea is to generate an
5455

5556
```python
5657
completion = client.chat.completions.create(
57-
model="Qwen/Qwen2.5-3B-Instruct",
58+
model=model,
5859
messages=[
5960
{
6061
"role": "user",
@@ -92,26 +93,32 @@ class CarDescription(BaseModel):
9293
json_schema = CarDescription.model_json_schema()
9394

9495
completion = client.chat.completions.create(
95-
model="Qwen/Qwen2.5-3B-Instruct",
96+
model=model,
9697
messages=[
9798
{
9899
"role": "user",
99100
"content": "Generate a JSON with the brand, model and car_type of the most iconic car from the 90's",
100101
}
101102
],
102-
extra_body={"guided_json": json_schema},
103+
"response_format": {
104+
"type": "json_schema",
105+
"json_schema": {
106+
"name": "car-description",
107+
"schema": CarDescription.model_json_schema()
108+
},
109+
},
103110
)
104111
print(completion.choices[0].message.content)
105112
```
106113

107114
!!! tip
108115
While not strictly necessary, normally it´s better to indicate in the prompt the
109-
JSON schema and how the fields should be populated. This can improve the
116+
JSON schema and how the fields should be populated. This can improve the
110117
results notably in most cases.
111118

112119
Finally we have the `guided_grammar` option, which is probably the most
113120
difficult to use, but it´s really powerful. It allows us to define complete
114-
languages like SQL queries. It works by using a context free EBNF grammar.
121+
languages like SQL queries. It works by using a context free EBNF grammar.
115122
As an example, we can use to define a specific format of simplified SQL queries:
116123

117124
```python
@@ -130,7 +137,7 @@ simplified_sql_grammar = """
130137
"""
131138

132139
completion = client.chat.completions.create(
133-
model="Qwen/Qwen2.5-3B-Instruct",
140+
model=model,
134141
messages=[
135142
{
136143
"role": "user",
@@ -142,7 +149,48 @@ completion = client.chat.completions.create(
142149
print(completion.choices[0].message.content)
143150
```
144151

145-
Full example: <gh-file:examples/online_serving/openai_chat_completion_structured_outputs.py>
152+
See also: [full example](../../examples/online_serving/structured_outputs)
153+
154+
## Reasoning Outputs
155+
156+
You can also use structured outputs with <project:#reasoning-outputs> for reasoning models.
157+
158+
```bash
159+
vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-7B --reasoning-parser deepseek_r1
160+
```
161+
162+
Note that you can use reasoning with any provided structured outputs feature. The following uses one with JSON schema:
163+
164+
```python
165+
from pydantic import BaseModel
166+
167+
168+
class People(BaseModel):
169+
name: str
170+
age: int
171+
172+
173+
completion = client.chat.completions.create(
174+
model=model,
175+
messages=[
176+
{
177+
"role": "user",
178+
"content": "Generate a JSON with the name and age of one random person.",
179+
}
180+
],
181+
response_format={
182+
"type": "json_schema",
183+
"json_schema": {
184+
"name": "people",
185+
"schema": People.model_json_schema()
186+
}
187+
},
188+
)
189+
print("reasoning_content: ", completion.choices[0].message.reasoning_content)
190+
print("content: ", completion.choices[0].message.content)
191+
```
192+
193+
See also: [full example](../../examples/online_serving/structured_outputs)
146194

147195
## Experimental Automatic Parsing (OpenAI API)
148196

@@ -163,14 +211,14 @@ class Info(BaseModel):
163211
age: int
164212

165213
client = OpenAI(base_url="http://0.0.0.0:8000/v1", api_key="dummy")
214+
model = client.models.list().data[0].id
166215
completion = client.beta.chat.completions.parse(
167-
model="meta-llama/Llama-3.1-8B-Instruct",
216+
model=model,
168217
messages=[
169218
{"role": "system", "content": "You are a helpful assistant."},
170219
{"role": "user", "content": "My name is Cameron, I'm 28. What's my name and age?"},
171220
],
172221
response_format=Info,
173-
extra_body=dict(guided_decoding_backend="outlines"),
174222
)
175223

176224
message = completion.choices[0].message
@@ -203,15 +251,13 @@ class MathResponse(BaseModel):
203251
steps: list[Step]
204252
final_answer: str
205253

206-
client = OpenAI(base_url="http://0.0.0.0:8000/v1", api_key="dummy")
207254
completion = client.beta.chat.completions.parse(
208-
model="meta-llama/Llama-3.1-8B-Instruct",
255+
model=model,
209256
messages=[
210257
{"role": "system", "content": "You are a helpful expert math tutor."},
211258
{"role": "user", "content": "Solve 8x + 31 = 2."},
212259
],
213260
response_format=MathResponse,
214-
extra_body=dict(guided_decoding_backend="outlines"),
215261
)
216262

217263
message = completion.choices[0].message
@@ -232,11 +278,11 @@ Step #2: explanation="Next, let's isolate 'x' by dividing both sides of the equa
232278
Answer: x = -29/8
233279
```
234280

235-
An example of using `structural_tag` can be found here: <gh-file:examples/online_serving/openai_chat_completion_structured_outputs_structural_tag.py>
281+
An example of using `structural_tag` can be found here: <gh-file:examples/online_serving/structured_outputs>
236282

237283
## Offline Inference
238284

239-
Offline inference allows for the same types of guided decoding.
285+
Offline inference allows for the same types of structured outputs.
240286
To use it, we´ll need to configure the guided decoding using the class `GuidedDecodingParams` inside `SamplingParams`.
241287
The main available options inside `GuidedDecodingParams` are:
242288

@@ -247,7 +293,7 @@ The main available options inside `GuidedDecodingParams` are:
247293
- `structural_tag`
248294

249295
These parameters can be used in the same way as the parameters from the Online
250-
Serving examples above. One example for the usage of the `choice` parameter is
296+
Serving examples above. One example for the usage of the `choice` parameter is
251297
shown below:
252298

253299
```python
@@ -265,4 +311,4 @@ outputs = llm.generate(
265311
print(outputs[0].outputs[0].text)
266312
```
267313

268-
Full example: <gh-file:examples/offline_inference/structured_outputs.py>
314+
See also: [full example](../../examples/online_serving/structured_outputs)

0 commit comments

Comments
 (0)