Skip to content

penghao-wu/GUI_Reflection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GUI Reflection: Empowering Multimodal GUI Models with Self-Reflection Behavior

Static Badge Static Badge

pipeline

Contents:

  1. Installation
  2. Data
  3. Model
  4. Evaluation
  5. Agent Inference
  6. Training
  7. Online Environment
  8. License
  9. Citation
  10. Acknowledgement

Installation

  1. Install the internvl environment following the guidelines in InternVL.
  2. Add internvl_chat to PYTHONPATH: export PYTHONPATH=$PYTHONPATH:path_to_GUI_Reflection_repo/internvl_chat

Data

The training and evaluation data in the GUI Reflection Task Suite are provided in GUI_Reflection_Task_Suite_Benchmark and GUI_Reflection_Task_Suite_train.
The offline SFT data are provided in GUI_Reflection_SFT_train.
The data we provided does not include the source images. You can download the source images from the source datasets accordingly.

Model

Our model after the GUI pre-training stage can be found at GUI_Reflection_8b_pretrain.
Our final GUI agent model can be found at GUI_Reflection_8b_SFT.

Evaluation

We provide the evaluation script to evaluate on the GUI Reflection Task Suite.
You should read and correctly set the required fields in internvl_chat/eval/launch_eval.sh and launch the evaluation by

cd internvl_chat/eval && GPUS=8 bash launch_eval.sh

Agent Inference

We implement an agent class in internvl_chat/gui_agent.py to run our model as a GUI agent to perform GUI tasks.
You can initialize the agent model with gui_agent = GUI_Reflection_Agent(model_path).
Before running a new task, reset the agent by gui_agent.reset().
To get the prediction from the agent at each step, run action = gui_agent.step(image, task_goal).

Training

We provide the training script to perform the offline SFT training in internvl_chat/train_scripts/offline_sft.sh.
You need to first prepare the offline SFT data and set the data+image path in internvl_chat/train_scripts/offline_sft_data.json.

Online Environment

The environment for GUI Reflection is provided at gui_reflection_env.

License

This project is under the Apache-2.0 license. See LICENSE for details.

Citation

Please consider citing our paper if you find this project helpful for your research:

@article{GUI_Reflection,
  author    = {Wu, Penghao and Ma, Shengnan and Wang, Bo and Yu, Jiaheng and Lu, Lewei and Liu, Ziwei},
  title     = {GUI-Reflection: Empowering Multimodal GUI Models with Self-Reflection Behavior},
  journal={arXiv preprint arXiv:2506.08012},
  year={2025}}

Acknowledgement

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published