Skip to content

[Doc] Unify structured outputs examples #18196

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 27 commits into from
Jun 12, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
17b0ae2
chore: unify examples for structured outputs
aarnphm May 15, 2025
88fe01e
chore: cleanup reasoning
aarnphm May 15, 2025
82acea3
chore: synchronize logics
aarnphm May 15, 2025
33d50ee
chore: simplify script
aarnphm May 15, 2025
236e467
chore: refactor to dir
aarnphm May 16, 2025
1d3395f
feat: support remote setup
aarnphm May 16, 2025
61884f1
chore: cleanup naming
aarnphm May 16, 2025
bc5d2e7
fix: typing
aarnphm May 16, 2025
6ebc310
fix: pre-commit
aarnphm May 16, 2025
127183c
chore: reduce diff
aarnphm May 16, 2025
8059a2c
merge: branch 'main' of github.com:vllm-project/vllm into chore/unify…
aarnphm May 23, 2025
504c834
chore: reduce diff
aarnphm May 23, 2025
c26afcc
chore: run new ruff format for these examples
aarnphm May 23, 2025
9ab17af
fix make sure docs build
aarnphm May 23, 2025
871784e
fix: set a fixed version for now
aarnphm May 23, 2025
bf62ec8
chore: update project link
aarnphm May 23, 2025
d1c7e82
fix: update absolute link
aarnphm May 23, 2025
7204687
fix: correct path
aarnphm May 23, 2025
93978e4
chore: add alias to relative links
aarnphm May 23, 2025
2c8df98
Merge branch 'vllm-project:main' into chore/unify-examples
aarnphm May 26, 2025
43dc417
chore: remove pre-commit
aarnphm May 27, 2025
db54b23
merge: branch 'main' of github.com:vllm-project/vllm into chore/unify…
aarnphm May 27, 2025
e6f20f7
chore: simplify examples
aarnphm May 27, 2025
9e8ef79
fix: correct naming
aarnphm May 27, 2025
0d8faf1
chore: remove unused workspaces entries
aarnphm May 27, 2025
c7b1058
merge: branch 'main' of github.com:vllm-project/vllm into chore/unify…
aarnphm Jun 12, 2025
fe529a9
chore: merge change from main
aarnphm Jun 12, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 0 additions & 45 deletions docs/features/reasoning_outputs.md
Original file line number Diff line number Diff line change
Expand Up @@ -142,51 +142,6 @@ for chunk in stream:

Remember to check whether the `reasoning_content` exists in the response before accessing it. You could checkout the [example](https://github.com/vllm-project/vllm/blob/main/examples/online_serving/openai_chat_completion_with_reasoning_streaming.py).

## Structured output

The reasoning content is also available in the structured output. The structured output engine like `xgrammar` will use the reasoning content to generate structured output. It is only supported in v0 engine now.

```bash
vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B --reasoning-parser deepseek_r1
```

The following is an example client:

```python
from openai import OpenAI
from pydantic import BaseModel

# Modify OpenAI's API key and API base to use vLLM's API server.
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"

client = OpenAI(
api_key=openai_api_key,
base_url=openai_api_base,
)

models = client.models.list()
model = models.data[0].id

class People(BaseModel):
name: str
age: int

json_schema = People.model_json_schema()

prompt = ("Generate a JSON with the name and age of one random person.")
completion = client.chat.completions.create(
model=model,
messages=[{
"role": "user",
"content": prompt,
}],
extra_body={"guided_json": json_schema},
)
print("reasoning_content: ", completion.choices[0].message.reasoning_content)
print("content: ", completion.choices[0].message.content)
```

## Tool Calling

The reasoning content is also available when both tool calling and the reasoning parser are enabled. Additionally, tool calling only parses functions from the `content` field, not from the `reasoning_content`.
Expand Down
80 changes: 63 additions & 17 deletions docs/features/structured_outputs.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,9 +39,10 @@ client = OpenAI(
base_url="http://localhost:8000/v1",
api_key="-",
)
model = client.models.list().data[0].id

completion = client.chat.completions.create(
model="Qwen/Qwen2.5-3B-Instruct",
model=model,
messages=[
{"role": "user", "content": "Classify this sentiment: vLLM is wonderful!"}
],
Expand All @@ -54,7 +55,7 @@ The next example shows how to use the `guided_regex`. The idea is to generate an

```python
completion = client.chat.completions.create(
model="Qwen/Qwen2.5-3B-Instruct",
model=model,
messages=[
{
"role": "user",
Expand Down Expand Up @@ -92,26 +93,32 @@ class CarDescription(BaseModel):
json_schema = CarDescription.model_json_schema()

completion = client.chat.completions.create(
model="Qwen/Qwen2.5-3B-Instruct",
model=model,
messages=[
{
"role": "user",
"content": "Generate a JSON with the brand, model and car_type of the most iconic car from the 90's",
}
],
extra_body={"guided_json": json_schema},
"response_format": {
"type": "json_schema",
"json_schema": {
"name": "car-description",
"schema": CarDescription.model_json_schema()
},
},
)
print(completion.choices[0].message.content)
```

!!! tip
While not strictly necessary, normally it´s better to indicate in the prompt the
JSON schema and how the fields should be populated. This can improve the
JSON schema and how the fields should be populated. This can improve the
results notably in most cases.

Finally we have the `guided_grammar` option, which is probably the most
difficult to use, but it´s really powerful. It allows us to define complete
languages like SQL queries. It works by using a context free EBNF grammar.
languages like SQL queries. It works by using a context free EBNF grammar.
As an example, we can use to define a specific format of simplified SQL queries:

```python
Expand All @@ -130,7 +137,7 @@ simplified_sql_grammar = """
"""

completion = client.chat.completions.create(
model="Qwen/Qwen2.5-3B-Instruct",
model=model,
messages=[
{
"role": "user",
Expand All @@ -142,7 +149,48 @@ completion = client.chat.completions.create(
print(completion.choices[0].message.content)
```

Full example: <gh-file:examples/online_serving/openai_chat_completion_structured_outputs.py>
See also: [full example](../../examples/online_serving/structured_outputs)

## Reasoning Outputs

You can also use structured outputs with <project:#reasoning-outputs> for reasoning models.

```bash
vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-7B --reasoning-parser deepseek_r1
```

Note that you can use reasoning with any provided structured outputs feature. The following uses one with JSON schema:

```python
from pydantic import BaseModel


class People(BaseModel):
name: str
age: int


completion = client.chat.completions.create(
model=model,
messages=[
{
"role": "user",
"content": "Generate a JSON with the name and age of one random person.",
}
],
response_format={
"type": "json_schema",
"json_schema": {
"name": "people",
"schema": People.model_json_schema()
}
},
)
print("reasoning_content: ", completion.choices[0].message.reasoning_content)
print("content: ", completion.choices[0].message.content)
```

See also: [full example](../../examples/online_serving/structured_outputs)

## Experimental Automatic Parsing (OpenAI API)

Expand All @@ -163,14 +211,14 @@ class Info(BaseModel):
age: int

client = OpenAI(base_url="http://0.0.0.0:8000/v1", api_key="dummy")
model = client.models.list().data[0].id
completion = client.beta.chat.completions.parse(
model="meta-llama/Llama-3.1-8B-Instruct",
model=model,
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "My name is Cameron, I'm 28. What's my name and age?"},
],
response_format=Info,
extra_body=dict(guided_decoding_backend="outlines"),
)

message = completion.choices[0].message
Expand Down Expand Up @@ -203,15 +251,13 @@ class MathResponse(BaseModel):
steps: list[Step]
final_answer: str

client = OpenAI(base_url="http://0.0.0.0:8000/v1", api_key="dummy")
completion = client.beta.chat.completions.parse(
model="meta-llama/Llama-3.1-8B-Instruct",
model=model,
messages=[
{"role": "system", "content": "You are a helpful expert math tutor."},
{"role": "user", "content": "Solve 8x + 31 = 2."},
],
response_format=MathResponse,
extra_body=dict(guided_decoding_backend="outlines"),
)

message = completion.choices[0].message
Expand All @@ -232,11 +278,11 @@ Step #2: explanation="Next, let's isolate 'x' by dividing both sides of the equa
Answer: x = -29/8
```

An example of using `structural_tag` can be found here: <gh-file:examples/online_serving/openai_chat_completion_structured_outputs_structural_tag.py>
An example of using `structural_tag` can be found here: <gh-file:examples/online_serving/structured_outputs>

## Offline Inference

Offline inference allows for the same types of guided decoding.
Offline inference allows for the same types of structured outputs.
To use it, we´ll need to configure the guided decoding using the class `GuidedDecodingParams` inside `SamplingParams`.
The main available options inside `GuidedDecodingParams` are:

Expand All @@ -247,7 +293,7 @@ The main available options inside `GuidedDecodingParams` are:
- `structural_tag`

These parameters can be used in the same way as the parameters from the Online
Serving examples above. One example for the usage of the `choice` parameter is
Serving examples above. One example for the usage of the `choice` parameter is
shown below:

```python
Expand All @@ -265,4 +311,4 @@ outputs = llm.generate(
print(outputs[0].outputs[0].text)
```

Full example: <gh-file:examples/offline_inference/structured_outputs.py>
See also: [full example](../../examples/online_serving/structured_outputs)
Loading