Skip to content
/ UI-R1 Public

Code for "UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning"

License

Notifications You must be signed in to change notification settings

lll6gg/UI-R1

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning

🔥 Overview

We propose UI-R1, the first framework to explore how rule-based RL can enhance the reasoning capabilities of multimodal large language models (MLLMs) for GUI action prediction tasks. Logo

Experimental results demonstrate that our proposed UI-R1-3B achieves significant improvements over the base model (i.e. Qwen2.5-VL-3B) on both in-domain (ID) and out-of-domain (OOD) tasks, with average accuracy gains of 22.1% on ScreenSpot, 6.0% on ScreenSpot-Pro, and 12.7% on AndroidControl. Furthermore, UI-R1-3B delivers competitive performance compared to larger models (e.g., OS-Atlas-7B) trained via supervised fine-tuning (SFT) on 76K samples.

Logo

Setup

conda create -n ui-r1 python=3.10
conda activate ui-r1
bash setup.sh

Data

Our training mobile data is a subset from AndroidControl and ScreenSpot.

You can also prepare your training or inference data like:

images/:
	image1.png
	image2.png
test.json:
[
	{
	"img_filename": "image1.png",
        "bbox": [
            825,
            72,
            1673,
            149
        ],
        "instruction": "search bar"
     },
     {
	"img_filename": "image2.png",
        "bbox": [
            123,
            732,
            334,
            812
        ],
        "instruction": "check weather"
     }
]

where bbox : [x1,y1,x2,y2] is the coordinate of the left top and the right bottom of the ground truth bbox

Inference

We provide an example here

cd evaluation/
bash test.sh

Please fill the MODEL_PATH, IMG_PATH, TEST_JSON with your real checkpoint path and data path.

Training

cd src/script/
bash train.sh

🗞️ News

  • 2025-04-02: We release the datasets of the UI-R1-3B model.
  • 2025-03-30: We release the checkpoints of the UI-R1-3B model.
  • 2025-03-30: We release the UI-R1 repository.
  • 2025-03-27: We release our paper.

⭐️ Citation

If you find this project useful, welcome to cite us.

@article{lu2025ui,
  title={UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning},
  author={Lu, Zhengxi and Chai, Yuxiang and Guo, Yaxuan and Yin, Xi and Liu, Liang and Wang, Hao and Xiong, Guanjing and Li, Hongsheng},
  journal={arXiv preprint arXiv:2503.21620},
  year={2025}
}

🤝 Acknowledgements

We sincerely thank projects R1-V, Open-R1, and Open-r1-multimodal, VLM-R1 for providing their open-source resources.

About

Code for "UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning"

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published