Skip to content

Commit b298a32

Browse files
committed
add doc
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
1 parent a269173 commit b298a32

File tree

1 file changed

+35
-0
lines changed

1 file changed

+35
-0
lines changed

docs/configuration/conserving_memory.md

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -122,6 +122,41 @@ llm = LLM(model="google/gemma-3-27b-it",
122122
limit_mm_per_prompt={"image": 0})
123123
```
124124

125+
### Configurable options
126+
`limit_mm_per_prompt` also accepts configurable options per modality. In the configurable form, you still specify `count`, and you may optionally provide size hints that control how vLLM profiles and reserves memory for your multi‑modal inputs. This helps you tune memory for the actual media you expect, instead of the model’s absolute maxima.
127+
128+
Configurable options by modality:
129+
- `image`: `{"count": int, "width": int, "height": int}`
130+
- `video`: `{"count": int, "num_frames": int, "width": int, "height": int}`
131+
- `audio`: `{"count": int, "length": int}`
132+
133+
Examples:
134+
135+
```python
136+
from vllm import LLM
137+
138+
# Up to 5 images per prompt, profile with 512x512.
139+
# Up to 1 video per prompt, profile with 32 frames at 640x640.
140+
llm = LLM(
141+
model="Qwen/Qwen2.5-VL-3B-Instruct",
142+
limit_mm_per_prompt={
143+
"image": {"count": 5, "width": 512, "height": 512},
144+
"video": {"count": 1, "num_frames": 32, "width": 640, "height": 640},
145+
},
146+
)
147+
```
148+
149+
Notes:
150+
- Backward compatible and mixed format: passing an integer works as before and is interpreted as `{"count": <int>}`.
151+
e.g., `limit_mm_per_prompt={"image": 5}` is equivalent to `limit_mm_per_prompt={"image": {"count": 5}}`.
152+
e.g. `limit_mm_per_prompt={"image": 5, "video": {"count": 1, "num_frames": 32, "width": 640, "height": 640}}` is equivalent to `limit_mm_per_prompt={"image": {"count": 5}, "video": {"count": 1, "num_frames": 32, "width": 640, "height": 640}}`.
153+
- The size hints affect memory profiling only. They shape the dummy inputs
154+
used to compute reserved activation sizes. They do not change how
155+
inputs are actually processed at inference time.
156+
- If a hint exceeds what the model can accept, vLLM clamps it to the model’s
157+
effective maximum and may log a warning.
158+
- TODO: Encoder cache size and actual input processing are not affected by these size hints, which should be addressed later.
159+
125160
## Multi-modal processor arguments
126161

127162
For certain models, you can adjust the multi-modal processor arguments to

0 commit comments

Comments
 (0)