Skywork-R1V: Pioneering Multimodal Reasoning with CoT

Welcome to the Skywork-R1V repository! Here, you'll find the model weights and inference code for our state-of-the-art open-sourced multimodal reasoning model, enabling advanced visual and logical thinking.

🔥News

Mar 18, 2025: We are thrilled to introduce Skywork R1V, the first industry open-sourced multimodal reasoning model with advanced visual chain-of-thought capabilities, pushing the boundaries of AI-driven vision and logical inference! 🚀

Feature

Visual Chain-of-Thought: Enables multi-step logical reasoning on visual inputs, breaking down complex image-based problems into manageable steps.
Mathematical & Scientific Analysis: Capable of solving visual math problems and interpreting scientific/medical imagery with high precision.
Cross-Modal Understanding: Seamlessly integrates text and images for richer, context-aware comprehension.

Evaluation

Evaluation results of state-of-the-art LLMs and VLMs

	Size	Vision	Reasoning			Vision
			MATH-500	AIME 2024	GPQA	MathVista(mini)	MMMU(Val)
			pass@1	pass@1	pass@1	pass@1	pass@1
Qwen2.5-72B-Instruct	72B	❌	80.0	23.3	49.0	-	-
Deepseek V3	671B	❌	90.2	39.2	59.1	-	-
Deepseek R1	671B	❌	97.3	79.8	71.5	-	-
Claude 3.5 Sonnet	-	✅	78.3	16.0	65.0	65.3	66.4
GPT-4o	-	✅	74.6	9.3	49.9	63.8	69.1
Kimi k1.5	-	✅	96.2	77.5	-	74.9	70.0
Qwen2.5-VL-72B-Instruct	72B	✅	-	-	-	74.8	70.2
LLaVA-Onevision-72B	72B	✅	-	-	-	67.5	56.8
InternVL2-Llama3-76B	76B	✅	-	-	-	65.5	62.7
InternVL2.5-78B	78B	✅	-	-	-	72.3	70.1
Skywork-R1V-38B	38B	✅	94.0	72.0	61.6	67.5	69.0

Comparison with Larger-Scale Open-Source and Closed-Source Models

How to Run Locally

1. Clone the Repository

git clone https://github.com/SkyworkAI/Skywork-R1V.git
cd skywork-r1v/inference

2. Set Up the Environment

conda create -n r1-v python=3.10
conda activate r1-v
bash setup.sh

3. Run the Inference Script

CUDA_VISIBLE_DEVICES="0,1" python inference_with_transformers.py \
    --model_path path \
    --image_paths image1_path \
    --question "your question"

License

This code repository is licensed under the MIT License. ✅ Commercial use permitted

✅ Modification allowed

✅ Distribution allowed

❌ No liability

Citation

If you use Skywork-R1V in your research, please cite:

@article{skywork2025r1v,
  title     = {Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought},
  author    = {Yi Peng, Chris, Xiaokun Wang, Yichen Wei, Jiangbo Pei, Weijie Qiu, Ai Jian, Yunzhuo Hao, Jiachun Pan, Tianyidan Xie, Li Ge, Rongxian Zhuang, Xuchen Song, Yang Liu, Yahui Zhou},
  year      = {2025},
  journal   = {https://github.com/SkyworkAI/Skywork-R1V/blob/main/report/Skywork_R1V.pdf},
  url       = {https://huggingface.co/Skywork/Skywork-R1V-38B}
}

Name		Name	Last commit message	Last commit date
Latest commit History 109 Commits
imgs		imgs
inference		inference
LICENSE		LICENSE
README.md		README.md
Skywork_R1V.pdf		Skywork_R1V.pdf
app.py		app.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Skywork-R1V: Pioneering Multimodal Reasoning with CoT

🔥News

Feature

Evaluation

How to Run Locally

1. Clone the Repository

2. Set Up the Environment

3. Run the Inference Script

License

Citation

Star History

About

Releases

Packages

Languages

License

shanhaiengine/Skywork-R1V

Folders and files

Latest commit

History

Repository files navigation

Skywork-R1V: Pioneering Multimodal Reasoning with CoT

🔥News

Feature

Evaluation

How to Run Locally

1. Clone the Repository

2. Set Up the Environment

3. Run the Inference Script

License

Citation

Star History

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages