GUI-G1

The model checkpoint and training datasets are under review due to company policies and will be released soon. Thank you for your patience.

News

[2025/9/16] Our model GUI-G1-3B-v1 in our paper is released.
[2025/5/22] Our code is released.
[2025/5/22] Our paper is released.

GUI-G1: Understanding R1-Zero-Like Training for Visual Grounding in GUI Agents

This repository is based on VLM-R1, with several improvements and adaptations for our use case, especially on Template, Reward Functions, and GRPO Objective.

In this work, we build upon the original VLM-R1 frameworks. We introduce GUI-G1, a VLM fine-tuned for GUI Grounding.

🔧 Major Modifications of GUI-G1

Introduced a Fast Thinking Template that requires no model reasoning, accelerating training and inference
Utilized diverse reward functions (Hit, IoU, Box) to prevent reward hacking and achieve multi-objective optimization
Removed length correction from the GRPO objective and added a difficulty coefficient to enhance model robustness

Setup

conda create -n myproject python=3.10
conda activate myproject
bash setup.sh

Follow the steps below to prepare data and train the model:

[Data preparation instructions customized for your setup]

[Reference to your configuration files or modified scripts]

Use the following command to launch training:

bash src/open-r1-multimodal/run_scripts/run.sh

Results

Model	ScreenSpot	ScreenSPot-Pro
InfiGUI-R1-3B	87.5	35.7
GUI-G1-3B	90.3	37.1

Acknowledgements

This repository builds upon the great work from:

VLM-R1

We thank the authors for their open-source contributions.

Citation

If you find our code or work useful for your research, please cite our work:

@article{zhou2025guig1,
  title        = {GUI-G1: Understanding R1-Zero-Like Training for Visual Grounding in GUI Agents},
  author       = {Zhou, Yuqi and Dai, Sunhao and Wang, Shuai and Zhou, Kaiwen and Jia, Qinglin and Xu, Jun},
  journal      = {arXiv preprint arXiv:2505.15810},
  year         = {2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
data		data
src		src
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GUI-G1

News

GUI-G1: Understanding R1-Zero-Like Training for Visual Grounding in GUI Agents

🔧 Major Modifications of GUI-G1

Setup

Results

Acknowledgements

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GUI-G1

News

GUI-G1: Understanding R1-Zero-Like Training for Visual Grounding in GUI Agents

🔧 Major Modifications of GUI-G1

Setup

Results

Acknowledgements

Citation

About

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages