Skip to content

Commit 8c85305

Browse files
authored
[Docs] Enable fail_on_warning for the docs build in CI (#25580)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
1 parent f84a472 commit 8c85305

File tree

20 files changed

+81
-87
lines changed

20 files changed

+81
-87
lines changed

.readthedocs.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ build:
1313

1414
mkdocs:
1515
configuration: mkdocs.yaml
16+
fail_on_warning: true
1617

1718
# Optionally declare the Python requirements required to build your docs
1819
python:

docs/features/nixl_connector_usage.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ NixlConnector is a high-performance KV cache transfer connector for vLLM's disag
99
Install the NIXL library: `uv pip install nixl`, as a quick start.
1010

1111
- Refer to [NIXL official repository](https://github.com/ai-dynamo/nixl) for more installation instructions
12-
- The specified required NIXL version can be found in [requirements/kv_connectors.txt](../../requirements/kv_connectors.txt) and other relevant config files
12+
- The specified required NIXL version can be found in [requirements/kv_connectors.txt](gh-file:requirements/kv_connectors.txt) and other relevant config files
1313

1414
### Transport Configuration
1515

@@ -154,6 +154,6 @@ python tests/v1/kv_connector/nixl_integration/toy_proxy_server.py \
154154

155155
Refer to these example scripts in the vLLM repository:
156156

157-
- [run_accuracy_test.sh](../../tests/v1/kv_connector/nixl_integration/run_accuracy_test.sh)
158-
- [toy_proxy_server.py](../../tests/v1/kv_connector/nixl_integration/toy_proxy_server.py)
159-
- [test_accuracy.py](../../tests/v1/kv_connector/nixl_integration/test_accuracy.py)
157+
- [run_accuracy_test.sh](gh-file:tests/v1/kv_connector/nixl_integration/run_accuracy_test.sh)
158+
- [toy_proxy_server.py](gh-file:tests/v1/kv_connector/nixl_integration/toy_proxy_server.py)
159+
- [test_accuracy.py](gh-file:tests/v1/kv_connector/nixl_integration/test_accuracy.py)

docs/mkdocs/hooks/generate_argparse.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -32,8 +32,9 @@ def auto_mock(module, attr, max_mocks=50):
3232
for _ in range(max_mocks):
3333
try:
3434
# First treat attr as an attr, then as a submodule
35-
return getattr(importlib.import_module(module), attr,
36-
importlib.import_module(f"{module}.{attr}"))
35+
with patch("importlib.metadata.version", return_value="0.0.0"):
36+
return getattr(importlib.import_module(module), attr,
37+
importlib.import_module(f"{module}.{attr}"))
3738
except importlib.metadata.PackageNotFoundError as e:
3839
raise e
3940
except ModuleNotFoundError as e:

docs/models/generative_models.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ vLLM provides first-class support for generative models, which covers most of LL
44

55
In vLLM, generative models implement the[VllmModelForTextGeneration][vllm.model_executor.models.VllmModelForTextGeneration] interface.
66
Based on the final hidden states of the input, these models output log probabilities of the tokens to generate,
7-
which are then passed through [Sampler][vllm.model_executor.layers.sampler.Sampler] to obtain the final text.
7+
which are then passed through [Sampler][vllm.v1.sample.sampler.Sampler] to obtain the final text.
88

99
## Configuration
1010

docs/models/supported_models.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ _*Vision-language models currently accept only image inputs. Support for video i
2929

3030
If the Transformers model implementation follows all the steps in [writing a custom model](#writing-custom-models) then, when used with the Transformers backend, it will be compatible with the following features of vLLM:
3131

32-
- All the features listed in the [compatibility matrix](../features/compatibility_matrix.md#feature-x-feature)
32+
- All the features listed in the [compatibility matrix](../features/README.md#feature-x-feature)
3333
- Any combination of the following vLLM parallelisation schemes:
3434
- Pipeline parallel
3535
- Tensor parallel

docs/usage/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Using vLLM
22

3-
First, vLLM must be [installed](../getting_started/installation) for your chosen device in either a Python or Docker environment.
3+
First, vLLM must be [installed](../getting_started/installation/) for your chosen device in either a Python or Docker environment.
44

55
Then, vLLM supports the following usage patterns:
66

examples/online_serving/dashboards/grafana/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,9 +11,9 @@ vLLM performance and metrics.
1111

1212
## Dashboard Descriptions
1313

14-
- **[performance_statistics.json](./performance_statistics.json)**: Tracks performance metrics including latency and
14+
- **performance_statistics.json**: Tracks performance metrics including latency and
1515
throughput for your vLLM service.
16-
- **[query_statistics.json](./query_statistics.json)**: Tracks query performance, request volume, and key
16+
- **query_statistics.json**: Tracks query performance, request volume, and key
1717
performance indicators for your vLLM service.
1818

1919
## Deployment Options

examples/online_serving/dashboards/perses/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,9 +21,9 @@ deployment methods:
2121

2222
## Dashboard Descriptions
2323

24-
- **[performance_statistics.yaml](./performance_statistics.yaml)**: Performance metrics with aggregated latency
24+
- **performance_statistics.yaml**: Performance metrics with aggregated latency
2525
statistics
26-
- **[query_statistics.yaml](./query_statistics.yaml)**: Query performance and deployment metrics
26+
- **query_statistics.yaml**: Query performance and deployment metrics
2727

2828
## Deployment Options
2929

vllm/attention/ops/common.py

Lines changed: 19 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -18,12 +18,14 @@ def _correct_attn_cp_out_kernel(outputs_ptr, new_output_ptr, lses_ptr,
1818
final attention output.
1919
2020
Args:
21-
output: [ B, H, D ]
22-
lses : [ N, B, H ]
23-
cp, batch, q_heads, v_head_dim
24-
Return:
25-
output: [ B, H, D ]
26-
lse : [ B, H ]
21+
outputs_ptr (triton.PointerType):
22+
Pointer to input tensor of shape [ B, H, D ]
23+
lses_ptr (triton.PointerType):
24+
Pointer to input tensor of shape [ N, B, H ]
25+
new_output_ptr (triton.PointerType):
26+
Pointer to output tensor of shape [ B, H, D ]
27+
vlse_ptr (triton.PointerType):
28+
Pointer to output tensor of shape [ B, H ]
2729
"""
2830
batch_idx = tl.program_id(axis=0).to(tl.int64)
2931
head_idx = tl.program_id(axis=1).to(tl.int64)
@@ -81,19 +83,19 @@ def call_kernel(self, kernel, grid, *regular_args, **const_args):
8183
self.inner_kernel[grid](*regular_args)
8284

8385

84-
def correct_attn_out(out: torch.Tensor, lses: torch.Tensor, cp_rank: int,
85-
ctx: CPTritonContext):
86-
"""
87-
Apply the all-gathered lses to correct each local rank's attention
88-
output. we still need perform a cross-rank reduction to obtain the
89-
final attention output.
86+
def correct_attn_out(
87+
out: torch.Tensor, lses: torch.Tensor, cp_rank: int,
88+
ctx: CPTritonContext) -> tuple[torch.Tensor, torch.Tensor]:
89+
"""Correct the attention output using the all-gathered lses.
9090
9191
Args:
92-
output: [ B, H, D ]
93-
lses : [ N, B, H ]
94-
Return:
95-
output: [ B, H, D ]
96-
lse : [ B, H ]
92+
out: Tensor of shape [ B, H, D ]
93+
lses: Tensor of shape [ N, B, H ]
94+
cp_rank: Current rank in the context-parallel group
95+
ctx: Triton context to avoid recompilation
96+
97+
Returns:
98+
Tuple of (out, lse) with corrected attention and final log-sum-exp.
9799
"""
98100
if ctx is None:
99101
ctx = CPTritonContext()

vllm/inputs/data.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -287,8 +287,8 @@ class EncoderDecoderInputs(TypedDict):
287287

288288
SingletonInputs = Union[TokenInputs, EmbedsInputs, "MultiModalInputs"]
289289
"""
290-
A processed [`SingletonPrompt`][vllm.inputs.data.SingletonPrompt] which can be
291-
passed to [`vllm.sequence.Sequence`][].
290+
A processed [`SingletonPrompt`][vllm.inputs.data.SingletonPrompt] which can be
291+
passed to [`Sequence`][collections.abc.Sequence].
292292
"""
293293

294294
ProcessorInputs = Union[DecoderOnlyInputs, EncoderDecoderInputs]

0 commit comments

Comments
 (0)