Skip to content

Add tarsier model support #18985

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Jun 3, 2025
Merged

Conversation

princepride
Copy link
Contributor

@princepride princepride commented May 31, 2025

Add Tariser model support: #9707

FIX #9707

Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@mergify mergify bot added documentation Improvements or additions to documentation multi-modality Related to multi-modality (#4194) labels May 31, 2025
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
@princepride princepride changed the title [New model support]add tarsier model support Add tarsier model support May 31, 2025
@DarkLight1337
Copy link
Member

PTAL at the failing tests

Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
@princepride princepride requested a review from DarkLight1337 June 1, 2025 23:28
@princepride
Copy link
Contributor Author

@DarkLight1337 It seems that some other model has an error.

@DarkLight1337
Copy link
Member

cc @Isotr0py do you have time to help validate this model?

@princepride
Copy link
Contributor Author

from pathlib import Path
from vllm import LLM, SamplingParams
from PIL import Image
from vllm.multimodal.video import VideoMediaIO, ImageMediaIO

def extract_frames_from_video(video_filepath: str, num_frames: int):
    image_io = ImageMediaIO()
    video_io = VideoMediaIO(image_io=image_io, num_frames=num_frames)
    frames = video_io.load_file(Path(video_filepath))
    return frames

if __name__ == "__main__":
    EXAMPLE_IMAGE_PATH = "kitty.jpg"
    EXAMPLE_VIDEO_PATH = "kitchen.mp4"
    MAX_VIDEO_FRAMES = 4
    llm = LLM(model="omni-research/Tarsier-7b", trust_remote_code=True)
    sampling_params = SamplingParams(temperature=0.1, top_p=0.9, max_tokens=500)

    # Scenario 1: Pure text test
    print(f"\n--- Pure Text Test ---")
    vllm_inputs_text_only = {"prompt": "USER: Please introduce yourself. ASSISTANT:"}
    outputs = llm.generate(vllm_inputs_text_only, sampling_params)
    for output_item in outputs:
        print(f"Generated: {output_item.outputs[0].text}\n" + "-" * 20)

    # Scenario 2: Text and single image test
    print(f"\n--- Text and Single Image Test ---")
    vllm_inputs_single_image = {
        "prompt": "USER: <image>\nPlease describe the image. ASSISTANT:",
        "multi_modal_data": {"image": [Image.open(EXAMPLE_IMAGE_PATH).convert('RGB')]}
    }
    outputs = llm.generate(vllm_inputs_single_image, sampling_params) # Direct generation
    for output_item in outputs:
        print(f"Generated: {output_item.outputs[0].text}\n" + "-" * 20)

    # Scenario 3: Text and video (multiple frames) test
    vllm_inputs_video = {
        "prompt": f"USER: {'<image>'*MAX_VIDEO_FRAMES}\nPlease describe the video. ASSISTANT:",
        "multi_modal_data": {"image": extract_frames_from_video(EXAMPLE_VIDEO_PATH, MAX_VIDEO_FRAMES)}
    }
    outputs = llm.generate(vllm_inputs_video, sampling_params) # Direct generation
    for output_item in outputs:
        print(f"Generated: {output_item.outputs[0].text}\n" + "-" * 20)

    print("\nAll tests completed.")

Here is my simple test code, you can refer it

@Isotr0py
Copy link
Collaborator

Isotr0py commented Jun 2, 2025

--- Pure Text Test ---
Adding requests: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 195.30it/s]
Processed prompts: 100%|█████████████████████████████████████████████████| 1/1 [00:01<00:00,  1.05s/it, est. speed input: 12.42 toks/s, output: 23.88 toks/s]
Generated:  I am Vicuna, a language model trained by researchers from Large Model Systems Organization (LMSYS).
--------------------

--- Text and Single Image Test ---
Adding requests: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00,  1.07s/it]
Processed prompts: 100%|█████████████████████████████████████████████████| 1/1 [00:08<00:00,  8.95s/it, est. speed input: 69.08 toks/s, output: 27.50 toks/s]
Generated: The image captures a vibrant street scene in Chinatown, Melbourne, Australia. Dominating the foreground is a red octagonal stop sign, standing resolute on the sidewalk. It's a familiar sight, a universal symbol instructing drivers to halt their vehicles. Just beyond the sign, the street unfolds in a lively display of culture and commerce. A red lantern hangs from the side of a building, its color matching the stop sign and adding to the festive atmosphere. The building itself is a mix of traditional and modern architecture, with a green awning providing a pop of color against the urban landscape. The street is bustling with activity. People are seen walking on the sidewalk, adding a dynamic element to the scene. Cars are parked along the street, their metallic bodies gleaming under the sunlight. Above it all, the sky stretches out in a clear blue expanse, dotted here and there with trees that add a touch of nature to the urban setting. The image is a snapshot of everyday life in Melbourne, capturing the city's vibrant street scenes and multicultural atmosphere.
--------------------

--- Text and video (multiple frames) test ---
Adding requests: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 19.73it/s]
Processed prompts: 100%|████████████████████████████████████████████████| 1/1 [00:02<00:00,  2.89s/it, est. speed input: 837.87 toks/s, output: 18.00 toks/s]
Generated:  A young child is sitting on a bed, holding and interacting with a book. The child flips through the pages of the book, occasionally looking down at it. The background shows a bed with a blanket and some clothes scattered on it.
--------------------

According to the model outputs, model implementation should be fine.

Copy link
Collaborator

@Isotr0py Isotr0py left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also update the document's supported_models and vllm/entrypoints/chat_utils.py?

def _placeholder_str(self, modality: ModalityStr,
current_count: int) -> Optional[str]:
# TODO: Let user specify how to insert image tokens into prompt
# (similar to chat template)

… entrypoints placeholder

Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
@princepride princepride requested a review from hmellor as a code owner June 2, 2025 09:47
@mergify mergify bot added the frontend label Jun 2, 2025
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
@princepride princepride requested a review from Isotr0py June 2, 2025 12:38
Copy link
Collaborator

@Isotr0py Isotr0py left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Processor tests passed on my side locally as well. So this PR should be good to go!

Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
@Isotr0py Isotr0py added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 2, 2025
@princepride princepride requested a review from Isotr0py June 2, 2025 22:40
@princepride
Copy link
Contributor Author

@Isotr0py Sorry to bother you to review it again. This morning I noticed all 75 checks have passed but a button "update branch" occurred. I directly clicked the button and the whole process restarted again. Please forgive me for being a rookie. Do I not need to click the update branch button after all checks are passed?

@Isotr0py
Copy link
Collaborator

Isotr0py commented Jun 3, 2025

Hmmm, I remember we have disabled this button before... No need to update, I think this limitation would be disabled again, and we can merge this directly then.

@princepride
Copy link
Contributor Author

@DarkLight1337 Can you merge it, thank you.

@Isotr0py Isotr0py merged commit 1282bd8 into vllm-project:main Jun 3, 2025
67 checks passed
@DarkLight1337 DarkLight1337 mentioned this pull request Jun 3, 2025
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation frontend multi-modality Related to multi-modality (#4194) ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[New Model]: Tarsier
3 participants