Skip to content

Yuqi-Zhou/GUI-G1

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GUI-G1

The model checkpoint and training datasets are under review due to company policies and will be released soon. Thank you for your patience.

News

  • [2025/9/16] Our model GUI-G1-3B-v1 in our paper is released.
  • [2025/5/22] Our code is released.
  • [2025/5/22] Our paper is released.

GUI-G1: Understanding R1-Zero-Like Training for Visual Grounding in GUI Agents

This repository is based on VLM-R1, with several improvements and adaptations for our use case, especially on Template, Reward Functions, and GRPO Objective.

In this work, we build upon the original VLM-R1 frameworks. We introduce GUI-G1, a VLM fine-tuned for GUI Grounding.


🔧 Major Modifications of GUI-G1

  • Introduced a Fast Thinking Template that requires no model reasoning, accelerating training and inference
  • Utilized diverse reward functions (Hit, IoU, Box) to prevent reward hacking and achieve multi-objective optimization
  • Removed length correction from the GRPO objective and added a difficulty coefficient to enhance model robustness

Setup

conda create -n myproject python=3.10
conda activate myproject
bash setup.sh

Follow the steps below to prepare data and train the model:

  1. [Data preparation instructions customized for your setup]
  2. [Reference to your configuration files or modified scripts]
  3. Use the following command to launch training:
bash src/open-r1-multimodal/run_scripts/run.sh

Results

Model ScreenSpot ScreenSPot-Pro
InfiGUI-R1-3B 87.5 35.7
GUI-G1-3B 90.3 37.1

Acknowledgements

This repository builds upon the great work from:

We thank the authors for their open-source contributions.


Citation

If you find our code or work useful for your research, please cite our work:

@article{zhou2025guig1,
  title        = {GUI-G1: Understanding R1-Zero-Like Training for Visual Grounding in GUI Agents},
  author       = {Zhou, Yuqi and Dai, Sunhao and Wang, Shuai and Zhou, Kaiwen and Jia, Qinglin and Xu, Jun},
  journal      = {arXiv preprint arXiv:2505.15810},
  year         = {2025}
}

About

No description, website, or topics provided.

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors