[RFC]: Unification of frontend parser

## motivation

https://github.com/vllm-project/vllm/issues/11522 (with draft implementation at https://github.com/vllm-project/vllm/pull/11554)
aims to simplify the logics of the tool parser interface. However, this doesn't cover the cases for reasoning models (where we want to parse
tokens generated within the thinking budgets, etc. Our current solutions involves a reasoning parser, which will soon be running into the same
issue mentioned in #11522 when dealing with very long thinking budget). Additionally, the current implementations of tool calling are relatively
fragile, and not scalable when adding more tool format.

This RFC aims to build on top of some similar ideas from the RFC and unify both tool calling and reasoning parser logic for a more robust
way for us to move forward, especially with v0.10.x.

## proposed change


The workflow can be seen as follows:

- function/tool calling format for supported models (defined by the LLMEngine)
- Construct structural tags <- said tool/function calling format
- perform constrained decoding with supported backend (xgrammar/guidance)
- parser to convert string response -> structured objects

From vLLM perspective:

```bash
┌───────┐
│Prompt │
└───┬───┘
    │
    ▼
┌────────────────────────────────┐
│ vLLM (OpenAI‑compatible FE)    │
└───┬───────────────────┬────────┘
    │ [tool / func‑call │ reasoning_fmt]
    ▼                   │
┌──────────┐            │
│  Parser  │◀───────────┘
└───┬──────┘
    │
    ▼
┌────────────┐
│ LLM Engine │
└───┬────────┘
    │
    │
    ▼
┌────────┐
│ Parser │
└───┬────┘
    │
    ▼
┌────────────────────────────┐
│ vLLM (OpenAI‑compatible FE)│
└───┬────────────────────────┘
    │ 
    ▼ 
┌───────┐
│Output │  
└───────┘

```

Aim:

- Simplified and unified interface called `vllm.Parser`

There are a few compatibility matrix we need to consider:

| features              | function/tool caling | structured outputs | reasoning |
| --------------------- | -------------------- | ------------------ | --------- |
| function/tool calling | -                    |                    |           |
| structured outputs    |                      | -                  |           |
| reasoning             |                      |                    | -         |

_NOTE_: For reasoning logics, there are forced/non-forced mode (which is recently introduced by Qwen3-series of models)

A ad-hoc implementation of the parser would be

```python
class Parser:
  tool: bool = False
  reasoning: bool = False

  def parse_tool_call(self, structural_tag: StructuralTagResult) -> ToolCallResult: ...

  def parse_tool_call_stream(self, structural_tag: StructuralTagResult) -> DeltaToolCallResult: ...

  def parse_reasoning(self, structural_tag: StructuralTagResult) -> ReasoningResult: ...

  def parse_reasoning_stream(self, structural_tag: StructuralTagResult) -> DeltaReasoningResult: ...

class Llama3JSON(Parser, tool=True, name="llama3-json"): ...
class Pythonic(Parser, tool=True, name="pythonic"): ...

class DeepSeek(Parser, tool=True, reasoning=True, name="deepseek_r1"): ...
```

`serving_chat.py`:

```python

```

## Feedback period

tbd. wrt implementations, We will need to wait from the xgrammar team to have this support

## CC List

@mgoin @russellb @robertgshaw2-redhat @mmoskal 

## Any Other Thing

- We should probably move all of the tool/chat templates under `vllm/tools`


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[RFC]: Unification of frontend parser #17817

motivation

proposed change

Feedback period

CC List

Any Other Thing

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[RFC]: Unification of frontend parser #17817

Description

motivation

proposed change

Feedback period

CC List

Any Other Thing

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions