Skip to content

Commit 35398dd

Browse files
ycoolsumitd2
authored andcommitted
[Doc] Fix VLM prompt placeholder sample bug (vllm-project#9170)
Signed-off-by: Sumit Dubey <sumit.dubey2@ibm.com>
1 parent adff221 commit 35398dd

File tree

1 file changed

+7
-7
lines changed

1 file changed

+7
-7
lines changed

docs/source/models/vlm.rst

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ The :class:`~vllm.LLM` class can be instantiated in much the same way as languag
2525
To pass an image to the model, note the following in :class:`vllm.inputs.PromptType`:
2626

2727
* ``prompt``: The prompt should follow the format that is documented on HuggingFace.
28-
* ``multi_modal_data``: This is a dictionary that follows the schema defined in :class:`vllm.multimodal.MultiModalDataDict`.
28+
* ``multi_modal_data``: This is a dictionary that follows the schema defined in :class:`vllm.multimodal.MultiModalDataDict`.
2929

3030
.. code-block:: python
3131
@@ -34,7 +34,7 @@ To pass an image to the model, note the following in :class:`vllm.inputs.PromptT
3434
3535
# Load the image using PIL.Image
3636
image = PIL.Image.open(...)
37-
37+
3838
# Single prompt inference
3939
outputs = llm.generate({
4040
"prompt": prompt,
@@ -68,7 +68,7 @@ To pass an image to the model, note the following in :class:`vllm.inputs.PromptT
6868
"prompt": prompt,
6969
"multi_modal_data": mm_data,
7070
})
71-
71+
7272
for o in outputs:
7373
generated_text = o.outputs[0].text
7474
print(generated_text)
@@ -116,7 +116,7 @@ Instead of passing in a single image, you can pass in a list of images.
116116
.. code-block:: python
117117
118118
# Refer to the HuggingFace repo for the correct format to use
119-
prompt = "<|user|>\n<image_1>\n<image_2>\nWhat is the content of each image?<|end|>\n<|assistant|>\n"
119+
prompt = "<|user|>\n<|image_1|>\n<|image_2|>\nWhat is the content of each image?<|end|>\n<|assistant|>\n"
120120
121121
# Load the images using PIL.Image
122122
image1 = PIL.Image.open(...)
@@ -135,11 +135,11 @@ Instead of passing in a single image, you can pass in a list of images.
135135
136136
A code example can be found in `examples/offline_inference_vision_language_multi_image.py <https://github.com/vllm-project/vllm/blob/main/examples/offline_inference_vision_language_multi_image.py>`_.
137137

138-
Multi-image input can be extended to perform video captioning. We show this with `Qwen2-VL <https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct>`_ as it supports videos:
138+
Multi-image input can be extended to perform video captioning. We show this with `Qwen2-VL <https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct>`_ as it supports videos:
139139

140140
.. code-block:: python
141141
142-
# Specify the maximum number of frames per video to be 4. This can be changed.
142+
# Specify the maximum number of frames per video to be 4. This can be changed.
143143
llm = LLM("Qwen/Qwen2-VL-2B-Instruct", limit_mm_per_prompt={"image": 4})
144144
145145
# Create the request payload.
@@ -157,7 +157,7 @@ Multi-image input can be extended to perform video captioning. We show this with
157157
158158
# Perform inference and log output.
159159
outputs = llm.chat([message])
160-
160+
161161
for o in outputs:
162162
generated_text = o.outputs[0].text
163163
print(generated_text)

0 commit comments

Comments
 (0)