Official code for Paper "Mantis: Multi-Image Instruction Tuning" [TMLR 2024]
-
Updated
Mar 23, 2025 - Python
Official code for Paper "Mantis: Multi-Image Instruction Tuning" [TMLR 2024]
🔥 [ICLR 2025] Official Benchmark Toolkits for "Visual Haystacks: A Vision-Centric Needle-In-A-Haystack Benchmark"
[ICCV 2025] official repo of "MC-Bench: A Benchmark for Multi-Context Visual Grounding in the Era of MLLMs"
A Gradio-based demonstration for the AllenAI Molmo2-8B multimodal model, enabling image QA, multi-image pointing, video QA, and temporal tracking. Users upload images or videos, provide natural language prompts.
Add a description, image, and links to the multi-image-understanding topic page so that developers can more easily learn about it.
To associate your repository with the multi-image-understanding topic, visit your repo's landing page and select "manage topics."