📢 [Project Page] [Blog Post] [Models]
OmniParser is a comprehensive method for parsing user interface screenshots into structured and easy-to-understand elements, which significantly enhances the ability of GPT-4V to generate actions that can be accurately grounded in the corresponding regions of the interface.
- [2024/10] Both Interactive Region Detection Model and Icon functional description model are released! Hugginface models
- [2024/09] OmniParser achieves the best performance on Windows Agent Arena!
Install environment:
conda create -n "omni" python==3.12
conda activate omni
pip install -r requirements.txt
Then download the model ckpts files in: https://huggingface.co/microsoft/OmniParser, and put them under weights/, default folder structure is: weights/icon_detect, weights/icon_caption_florence, weights/icon_caption_blip2.
Finally, convert the safetensor to .pt file.
python weights/convert_safetensor_to_pt.py
We put together a few simple examples in the demo.ipynb.
To run gradio demo, simply run:
python gradio_demo.py
To run the cells in demo.ipynb
and verify everything is working as expected, follow these steps:
- Open the
demo.ipynb
file in Jupyter Notebook or JupyterLab. - Run each cell sequentially by clicking on the "Run" button or pressing
Shift + Enter
. - Verify the output of each cell to ensure it matches the expected results.
To debug Jupyter notebooks effectively, you can use the following tools and techniques:
-
Built-in Debugging Tools:
- Use the
%debug
magic command to enter the interactive debugger. - Use the
print
function to display variable values and track the flow of execution. - Leverage the
pdb
module to set breakpoints and step through the code.
- Use the
-
Organize and Structure Your Code:
- Break down your code into smaller, manageable cells to isolate issues more easily.
- Use comments and markdown cells to document your code and explain the logic.
- Ensure that each cell performs a specific task and avoid having too much code in a single cell.
-
Use External Tools and Libraries:
- Use external libraries like
ipdb
for an enhanced debugging experience. - Leverage visualization libraries like
matplotlib
orseaborn
to plot data and identify issues visually. - Utilize tools like
nbdime
to compare and merge notebook files, which can help identify changes that introduced bugs.
- Use external libraries like
After running the cells in demo.ipynb
, you can verify if everything is working as expected by checking the following:
- Ensure that the outputs of the cells match the expected results.
- Check for any error messages or warnings in the notebook.
- Verify that the functionality demonstrated in the notebook aligns with the intended behavior of the repository.
Our technical report can be found here. If you find our work useful, please consider citing our work:
@misc{lu2024omniparserpurevisionbased,
title={OmniParser for Pure Vision Based GUI Agent},
author={Yadong Lu and Jianwei Yang and Yelong Shen and Ahmed Awadallah},
year={2024},
eprint={2408.00203},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2408.00203},
}