Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
56 commits
Select commit Hold shift + click to select a range
ab9d63f
LP docs draft
afeldman-nm Aug 14, 2025
47b6329
wip
afeldman-nm Aug 19, 2025
c380d3a
Merge branch 'main' into lp_ext_docs
afeldman-nm Aug 19, 2025
75d6355
custom args
afeldman-nm Aug 19, 2025
505ac7d
wip
afeldman-nm Aug 19, 2025
9892dda
wip
afeldman-nm Aug 19, 2025
d0c2bb5
Merge branch 'main' into lp_ext_docs
afeldman-nm Aug 20, 2025
ebfb31f
design wip
afeldman-nm Aug 20, 2025
29ff326
more design
afeldman-nm Aug 21, 2025
6bbcddf
Merge branch 'main' into lp_ext_docs
afeldman-nm Aug 21, 2025
687ec93
examples
afeldman-nm Aug 21, 2025
f1d1ce3
Merge branch 'lp_ext_docs' of https://github.com/neuralmagic/vllm int…
afeldman-nm Aug 21, 2025
15f9ec7
fixed type annotation
afeldman-nm Aug 21, 2025
5f2d48b
refactor
afeldman-nm Aug 21, 2025
6ab2285
typo
afeldman-nm Aug 21, 2025
67a3e97
Merge branch 'main' into lp_ext_docs
afeldman-nm Aug 21, 2025
0121cde
fixes
afeldman-nm Aug 21, 2025
a4ca8d6
Merge branch 'main' into lp_ext_docs
afeldman-nm Aug 21, 2025
8cd78cd
Merge branch 'lp_ext_docs' of https://github.com/neuralmagic/vllm int…
afeldman-nm Aug 21, 2025
61dd26a
lint
afeldman-nm Aug 21, 2025
68e3789
fixes
afeldman-nm Aug 21, 2025
088e948
fix
afeldman-nm Aug 21, 2025
40185e0
cap
afeldman-nm Aug 21, 2025
d085a20
Update examples/offline_inference/logits_processor.py
afeldman-nm Aug 25, 2025
e9a5ad0
wip
afeldman-nm Aug 25, 2025
7951391
Merge branch 'lp_ext_docs' of https://github.com/neuralmagic/vllm int…
afeldman-nm Aug 25, 2025
6771862
Merge branch 'main' into lp_ext_docs
afeldman-nm Aug 26, 2025
d85a3d0
merge
afeldman-nm Sep 4, 2025
9de11fc
reorder
afeldman-nm Sep 4, 2025
71c1e48
Merge branch 'main' into lp_ext_docs
afeldman-nm Sep 4, 2025
f3fc293
Merge branch 'main' into lp_ext_docs
afeldman-nm Sep 5, 2025
bb2b302
feedback
afeldman-nm Sep 5, 2025
b777982
AsyncLLM
afeldman-nm Sep 5, 2025
91c79b7
reorder
afeldman-nm Sep 5, 2025
32b7c1e
reorg
afeldman-nm Sep 5, 2025
1b4e4e2
AsyncLLM example
afeldman-nm Sep 5, 2025
95f9ba7
warnings
afeldman-nm Sep 5, 2025
1c5fcd4
Merge branch 'main' into lp_ext_docs
afeldman-nm Sep 9, 2025
f3e173d
wrapped lps
afeldman-nm Sep 9, 2025
22275ec
Merge branch 'main' into lp_ext_docs
afeldman-nm Sep 9, 2025
b7e5912
refactor
afeldman-nm Sep 9, 2025
c70376d
lint failures
afeldman-nm Sep 9, 2025
5e755fe
Merge branch 'main' into lp_ext_docs
afeldman-nm Sep 9, 2025
bfc6ac5
retrigger checks
afeldman-nm Sep 9, 2025
0635564
Merge branch 'main' into lp_ext_docs
afeldman-nm Sep 10, 2025
80eed32
Merge branch 'main' into lp_ext_docs
afeldman-nm Sep 11, 2025
08cdd73
Merge branch 'main' into lp_ext_docs
afeldman-nm Sep 11, 2025
d945db0
Merge branch 'lp_ext_docs' of https://github.com/neuralmagic/vllm int…
afeldman-nm Sep 11, 2025
cf1d209
Merge branch 'main' into lp_ext_docs
afeldman-nm Sep 12, 2025
f4e53ed
Merge branch 'main' into lp_ext_docs
afeldman-nm Sep 12, 2025
1e091d0
Merge branch 'main' into lp_ext_docs
afeldman-nm Sep 15, 2025
6e38ec8
Merge branch 'main' into lp_ext_docs
afeldman-nm Sep 17, 2025
6c5f58d
disclaimer
afeldman-nm Sep 17, 2025
758e6b1
more disclaimer
afeldman-nm Sep 17, 2025
6b5148e
Merge branch 'main' into lp_ext_docs
afeldman-nm Sep 17, 2025
62b1d7c
Merge branch 'main' into lp_ext_docs
afeldman-nm Sep 17, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
559 changes: 559 additions & 0 deletions docs/design/logits_processors.md

Large diffs are not rendered by default.

46 changes: 46 additions & 0 deletions docs/features/custom_arguments.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# Custom Arguments

You can use vLLM *custom arguments* to pass in arguments which are not part of the vLLM `SamplingParams` and REST API specifications. Adding or removing a vLLM custom argument does not require recompiling vLLM, since the custom arguments are passed in as a dictionary.

Custom arguments can be useful if, for example, you want to use a [custom logits processor](./custom_logitsprocs.md) without modifying the vLLM source code.

## Offline Custom Arguments

Custom arguments passed to `SamplingParams.extra_args` as a `dict` will be visible to any code which has access to `SamplingParams`:

``` python
SamplingParams(extra_args={"your_custom_arg_name": 67})
```

This allows arguments which are not already part of `SamplingParams` to be passed into `LLM` as part of a request.

## Online Custom Arguments

The vLLM REST API allows custom arguments to be passed to the vLLM server via `vllm_xargs`. The example below integrates custom arguments into a vLLM REST API request:

``` bash
curl http://localhost:8000/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "Qwen/Qwen2.5-1.5B-Instruct",
...
"vllm_xargs": {"your_custom_arg": 67}
}'
```

Furthermore, OpenAI SDK users can access `vllm_xargs` via the `extra_body` argument:

``` python
batch = await client.completions.create(
model="Qwen/Qwen2.5-1.5B-Instruct",
...,
extra_body={
"vllm_xargs": {
"your_custom_arg": 67
}
}
)
```

!!! note
`vllm_xargs` is assigned to `SamplingParams.extra_args` under the hood, so code which uses `SamplingParams.extra_args` is compatible with both offline and online scenarios.
445 changes: 445 additions & 0 deletions docs/features/custom_logitsprocs.md

Large diffs are not rendered by default.

10 changes: 4 additions & 6 deletions examples/offline_inference/logits_processor/custom.py
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,6 @@ def __init__(
self.req_info: dict[int, int] = {}

def is_argmax_invariant(self) -> bool:
"""Never impacts greedy sampling"""
return False

def update_state(self, batch_update: Optional[BatchUpdate]):
Expand All @@ -75,13 +74,12 @@ def apply(self, logits: torch.Tensor) -> torch.Tensor:
return logits

# Save target values before modification
rows_list = list(self.req_info.keys())
cols = torch.tensor(
[self.req_info[i] for i in rows_list],
dtype=torch.long,
device=logits.device,
list(self.req_info.values()), dtype=torch.long, device=logits.device
)
rows = torch.tensor(
list(self.req_info.keys()), dtype=torch.long, device=logits.device
)
rows = torch.tensor(rows_list, dtype=torch.long, device=logits.device)
values_to_keep = logits[rows, cols].clone()

# Mask all but target tokens
Expand Down
7 changes: 4 additions & 3 deletions tests/v1/logits_processors/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -69,11 +69,12 @@ def apply(self, logits: torch.Tensor) -> torch.Tensor:
return logits

# Save target values before modification
rows_list = list(self.req_info.keys())
cols = torch.tensor([self.req_info[i] for i in rows_list],
cols = torch.tensor(list(self.req_info.values()),
dtype=torch.long,
device=logits.device)
rows = torch.tensor(list(self.req_info.keys()),
dtype=torch.long,
device=logits.device)
rows = torch.tensor(rows_list, dtype=torch.long, device=logits.device)
values_to_keep = logits[rows, cols].clone()

# Mask all but target tokens
Expand Down
6 changes: 3 additions & 3 deletions vllm/v1/sample/logits_processor/interface.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,9 @@ class MoveDirectionality(Enum):
SWAP = auto()


# Batch indices of any removed requests.
RemovedRequest = int

# (index, params, prompt_tok_ids, output_tok_ids) tuples for new
# requests added to the batch.
AddedRequest = tuple[int, SamplingParams, list[int], list[int]]
Expand All @@ -29,9 +32,6 @@ class MoveDirectionality(Enum):
# one-way moves or two-way swaps of requests in batch
MovedRequest = tuple[int, int, MoveDirectionality]

# Batch indices of any removed requests.
RemovedRequest = int


@dataclass(frozen=True)
class BatchUpdate:
Expand Down
8 changes: 4 additions & 4 deletions vllm/v1/sample/logits_processor/state.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,18 +36,18 @@ class BatchUpdateBuilder:

_removed: list[RemovedRequest]
_is_removed_sorted: bool
moved: list[MovedRequest]
added: list[AddedRequest]
moved: list[MovedRequest]

def __init__(
self,
removed: Optional[list[RemovedRequest]] = None,
moved: Optional[list[MovedRequest]] = None,
added: Optional[list[AddedRequest]] = None,
moved: Optional[list[MovedRequest]] = None,
) -> None:
self._removed = removed or []
self.moved = moved or []
self.added = added or []
self.moved = moved or []
self._is_removed_sorted = False

# Used to track changes in the pooling case
Expand Down Expand Up @@ -107,8 +107,8 @@ def reset(self) -> bool:
"""Returns True if there were any changes to the batch."""
self._is_removed_sorted = False
self._removed.clear()
self.moved.clear()
self.added.clear()
self.moved.clear()
batch_changed = self.batch_changed
self.batch_changed = False
return batch_changed
Expand Down