Skip to content

shanhaiengine/Skywork-R1V

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Skywork-R1V: Pioneering Multimodal Reasoning with CoT

Welcome to the Skywork-R1V repository! Here, you'll find the model weights and inference code for our state-of-the-art open-sourced multimodal reasoning model, enabling advanced visual and logical thinking.

🔥News

Mar 18, 2025: We are thrilled to introduce Skywork R1V, the first industry open-sourced multimodal reasoning model with advanced visual chain-of-thought capabilities, pushing the boundaries of AI-driven vision and logical inference! 🚀

math_r1v chemistry_1

Feature

  • Visual Chain-of-Thought: Enables multi-step logical reasoning on visual inputs, breaking down complex image-based problems into manageable steps.
  • Mathematical & Scientific Analysis: Capable of solving visual math problems and interpreting scientific/medical imagery with high precision.
  • Cross-Modal Understanding: Seamlessly integrates text and images for richer, context-aware comprehension.

Evaluation

Evaluation results of state-of-the-art LLMs and VLMs
Size Vision Reasoning Vision
MATH-500 AIME 2024 GPQA MathVista(mini) MMMU(Val)
pass@1 pass@1 pass@1 pass@1 pass@1
Qwen2.5-72B-Instruct 72B 80.0 23.3 49.0 - -
Deepseek V3 671B 90.2 39.2 59.1 - -
Deepseek R1 671B 97.3 79.8 71.5 - -
Claude 3.5 Sonnet - 78.3 16.0 65.0 65.3 66.4
GPT-4o - 74.6 9.3 49.9 63.8 69.1
Kimi k1.5 - 96.2 77.5 - 74.9 70.0
Qwen2.5-VL-72B-Instruct 72B - - - 74.8 70.2
LLaVA-Onevision-72B 72B - - - 67.5 56.8
InternVL2-Llama3-76B 76B - - - 65.5 62.7
InternVL2.5-78B 78B - - - 72.3 70.1
Skywork-R1V-38B 38B 94.0 72.0 61.6 67.5 69.0



Comparison with Larger-Scale Open-Source and Closed-Source Models
skywork_r1v_eval

How to Run Locally

1. Clone the Repository

git clone https://github.com/SkyworkAI/Skywork-R1V.git
cd skywork-r1v/inference

2. Set Up the Environment

conda create -n r1-v python=3.10
conda activate r1-v
bash setup.sh

3. Run the Inference Script

CUDA_VISIBLE_DEVICES="0,1" python inference_with_transformers.py \
    --model_path path \
    --image_paths image1_path \
    --question "your question"

License

This code repository is licensed under the MIT License. ✅ Commercial use permitted

✅ Modification allowed

✅ Distribution allowed

❌ No liability

Citation

If you use Skywork-R1V in your research, please cite:

@article{skywork2025r1v,
  title     = {Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought},
  author    = {Yi Peng, Chris, Xiaokun Wang, Yichen Wei, Jiangbo Pei, Weijie Qiu, Ai Jian, Yunzhuo Hao, Jiachun Pan, Tianyidan Xie, Li Ge, Rongxian Zhuang, Xuchen Song, Yang Liu, Yahui Zhou},
  year      = {2025},
  journal   = {https://github.com/SkyworkAI/Skywork-R1V/blob/main/report/Skywork_R1V.pdf},
  url       = {https://huggingface.co/Skywork/Skywork-R1V-38B}
}

Star History

Star History Chart

About

Pioneering Multimodal Reasoning with CoT

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 97.2%
  • Shell 2.8%