Skip to content

Commit 5cc54f7

Browse files
[Doc] Fix batch-level DP example (#23325)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com>
1 parent 0c6e40b commit 5cc54f7

File tree

1 file changed

+6
-5
lines changed

1 file changed

+6
-5
lines changed

docs/configuration/optimization.md

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -153,13 +153,14 @@ from vllm import LLM
153153

154154
llm = LLM(
155155
model="Qwen/Qwen2.5-VL-72B-Instruct",
156-
# Create two EngineCore instances, one per DP rank
157-
data_parallel_size=2,
158-
# Within each EngineCore instance:
159-
# The vision encoder uses TP=4 (not DP=2) to shard the input data
160-
# The language decoder uses TP=4 to shard the weights as usual
161156
tensor_parallel_size=4,
157+
# When mm_encoder_tp_mode="data",
158+
# the vision encoder uses TP=4 (not DP=1) to shard the input data,
159+
# so the TP size becomes the effective DP size.
160+
# Note that this is independent of the DP size for language decoder which is used in expert parallel setting.
162161
mm_encoder_tp_mode="data",
162+
# The language decoder uses TP=4 to shard the weights regardless
163+
# of the setting of mm_encoder_tp_mode
163164
)
164165
```
165166

0 commit comments

Comments
 (0)