Welcome to the Skywork-R1V repository! Here, you'll find the model weights and inference code for our state-of-the-art open-sourced multimodal reasoning model, enabling advanced visual and logical thinking.
Mar 18, 2025: We are thrilled to introduce Skywork R1V, the first industry open-sourced multimodal reasoning model with advanced visual chain-of-thought capabilities, pushing the boundaries of AI-driven vision and logical inference! 🚀
- Visual Chain-of-Thought: Enables multi-step logical reasoning on visual inputs, breaking down complex image-based problems into manageable steps.
- Mathematical & Scientific Analysis: Capable of solving visual math problems and interpreting scientific/medical imagery with high precision.
- Cross-Modal Understanding: Seamlessly integrates text and images for richer, context-aware comprehension.
Evaluation results of state-of-the-art LLMs and VLMs
Size | Vision | Reasoning | Vision | |||||
---|---|---|---|---|---|---|---|---|
MATH-500 | AIME 2024 | GPQA | MathVista(mini) | MMMU(Val) | ||||
pass@1 | pass@1 | pass@1 | pass@1 | pass@1 | ||||
Qwen2.5-72B-Instruct | 72B | ❌ | 80.0 | 23.3 | 49.0 | - | - | |
Deepseek V3 | 671B | ❌ | 90.2 | 39.2 | 59.1 | - | - | |
Deepseek R1 | 671B | ❌ | 97.3 | 79.8 | 71.5 | - | - | |
Claude 3.5 Sonnet | - | ✅ | 78.3 | 16.0 | 65.0 | 65.3 | 66.4 | |
GPT-4o | - | ✅ | 74.6 | 9.3 | 49.9 | 63.8 | 69.1 | |
Kimi k1.5 | - | ✅ | 96.2 | 77.5 | - | 74.9 | 70.0 | |
Qwen2.5-VL-72B-Instruct | 72B | ✅ | - | - | - | 74.8 | 70.2 | |
LLaVA-Onevision-72B | 72B | ✅ | - | - | - | 67.5 | 56.8 | |
InternVL2-Llama3-76B | 76B | ✅ | - | - | - | 65.5 | 62.7 | |
InternVL2.5-78B | 78B | ✅ | - | - | - | 72.3 | 70.1 | |
Skywork-R1V-38B | 38B | ✅ | 94.0 | 72.0 | 61.6 | 67.5 | 69.0 |
Comparison with Larger-Scale Open-Source and Closed-Source Models
git clone https://github.com/SkyworkAI/Skywork-R1V.git
cd skywork-r1v/inference
conda create -n r1-v python=3.10
conda activate r1-v
bash setup.sh
CUDA_VISIBLE_DEVICES="0,1" python inference_with_transformers.py \
--model_path path \
--image_paths image1_path \
--question "your question"
This code repository is licensed under the MIT License. ✅ Commercial use permitted
✅ Modification allowed
✅ Distribution allowed
❌ No liability
If you use Skywork-R1V in your research, please cite:
@article{skywork2025r1v,
title = {Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought},
author = {Yi Peng, Chris, Xiaokun Wang, Yichen Wei, Jiangbo Pei, Weijie Qiu, Ai Jian, Yunzhuo Hao, Jiachun Pan, Tianyidan Xie, Li Ge, Rongxian Zhuang, Xuchen Song, Yang Liu, Yahui Zhou},
year = {2025},
journal = {https://github.com/SkyworkAI/Skywork-R1V/blob/main/report/Skywork_R1V.pdf},
url = {https://huggingface.co/Skywork/Skywork-R1V-38B}
}