AppAgentX: Evolving GUI Agents as Proficient Smartphone Users

Wenjia Jiang^{1, 2} Yangyang Zhuang¹ Chenxi Song¹ Xu Yang³ Chi Zhang¹
¹AGI Lab, Westlake University, ²Henan University, ³Southeast University
jiangwenjia@westlake.edu.cn

Abstract

Recent advancements in Large Language Models (LLMs) have led to the development of intelligent LLM-based agents capable of interacting with graphical user interfaces (GUIs). These agents demonstrate strong reasoning and adaptability, enabling them to perform complex tasks that traditionally required predefined rules. However, the reliance on step-by-step reasoning in LLM-based agents often results in inefficiencies, particularly for routine tasks. In contrast, traditional rule-based systems excel in efficiency but lack the intelligence and flexibility to adapt to novel scenarios. To address this challenge, we propose a novel evolutionary framework for GUI agents that enhances operational efficiency while retaining intelligence and flexibility. Our approach incorporates a memory mechanism that records the agent's task execution history. By analyzing this history, the agent identifies repetitive action sequences and evolves high-level actions that act as shortcuts, replacing these low-level operations and improving efficiency. This allows the agent to focus on tasks requiring more complex reasoning, while simplifying routine actions. Experimental results on multiple benchmark tasks demonstrate that our approach significantly outperforms existing methods in both efficiency and accuracy. The code will be open-sourced to support further research.

🚀Getting Started

Before diving into the setup, it's worth mentioning that DeepSeek can be used with an OpenAI-compatible API format. By modifying the configuration, you can use the OpenAI SDK to access the DeepSeek API.

1. LLM Setup and Dependencies

This project utilizes LangChain and LangGraph to construct the agent framework. It is recommended to follow the installation methods suggested on their official websites. For other dependencies, please use pip install -r requirements.txt. For LLM configuration, adjust the relevant settings in the config.py file.

2. Database Deployment and Connection

We use Neo4j as the memory storage for the agent, leveraging its Cypher query language to facilitate node retrieval. For vector storage, Pinecone is employed. Ensure that the necessary API and keys are configured in the config.py file. For more information, visit Neo4j's official site and Pinecone's official site.

3. Screen Recognition and Feature Extraction Deployment

To simplify deployment, we use Docker to containerize the screen recognition and feature extraction services. Refer to the README in the backend folder for instructions on starting the container. Note that this may require Docker's GPU support; please consult Docker's official documentation for configuration. This modular approach allows for easy replacement of different screen parsing and feature extraction tools, significantly enhancing the model's scalability. If you need to deploy, please refer to the README file in the backend folder of the current project.

4. Launching the Demo

To use this project, you first need to configure ADB (Android Debug Bridge) to connect your Android device to your computer.

1. Setting Up ADB and Connecting Your Device

Install ADB on your PC:
Download and install Android Debug Bridge (adb)—a command-line tool that enables communication between your PC and an Android device.
Enable USB Debugging on your Android device:
- Go to Settings > Developer Options and enable USB Debugging.
Connect your device to the PC using a USB cable.

Using an Emulator Instead of a Physical Device (Optional)

If you do not have an actual Android device but still want to try AppAgent, we recommend using the built-in emulator in Android Studio:

Download and install Android Studio.
Open Device Manager in Android Studio to create and launch an emulator.
Install apps on the emulator by downloading APK files and dragging them into the emulator window.
AppAgent can detect and operate apps on an emulator just like on a real device.

Launching Gradio

Once your device or emulator is set up, you can start the project. We use Gradio as the front-end interface. Use one of the following commands to launch the demo:

python demo.py

or

gradio demo.py

Now, AppAgent should be ready to use! 🚀

📷 Screenshots and Demo

Below are several screenshots of AppAgentX after deployment:

Initial Interface	Automatic Exploration
Chain Execution	Execution Results

Here are demonstration GIFs of AppAgentX:

🔗 Related Links

AppAgent - First LLM-based intelligent smartphone application agent
OmniParser - Microsoft's multimodal interface parsing tool
LangChain - Framework for building LLM-powered applications

📚 BibTeX

If you find our repo helpful, please consider leaving a star or cite our paper :)

@misc{jiang2025appagentxevolvingguiagents,
      title={AppAgentX: Evolving GUI Agents as Proficient Smartphone Users},
      author={Wenjia Jiang and Yangyang Zhuang and Chenxi Song and Xu Yang and Chi Zhang},
      year={2025},
      eprint={2503.02268},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2503.02268},
}

📭 Contact

If you have any comments or questions, feel free to contact Wenjia Jiang.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
assets		assets
backend		backend
data		data
tool		tool
.gitignore		.gitignore
README.md		README.md
chain_evolve.py		chain_evolve.py
chain_understand.py		chain_understand.py
config.py		config.py
demo.py		demo.py
deployment.py		deployment.py
explor_auto.py		explor_auto.py
explor_human.py		explor_human.py
requirements.txt		requirements.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AppAgentX: Evolving GUI Agents as Proficient Smartphone Users

Abstract

🚀Getting Started

1. LLM Setup and Dependencies

2. Database Deployment and Connection

3. Screen Recognition and Feature Extraction Deployment

4. Launching the Demo

1. Setting Up ADB and Connecting Your Device

Using an Emulator Instead of a Physical Device (Optional)

Launching Gradio

📷 Screenshots and Demo

🔗 Related Links

📚 BibTeX

📭 Contact

About

Languages

Westlake-AGI-Lab/AppAgentX

Folders and files

Latest commit

History

Repository files navigation

AppAgentX: Evolving GUI Agents as Proficient Smartphone Users

Abstract

🚀Getting Started

1. LLM Setup and Dependencies

2. Database Deployment and Connection

3. Screen Recognition and Feature Extraction Deployment

4. Launching the Demo

1. Setting Up ADB and Connecting Your Device

Using an Emulator Instead of a Physical Device (Optional)

Launching Gradio

📷 Screenshots and Demo

🔗 Related Links

📚 BibTeX

📭 Contact

About

Resources

Stars

Watchers

Forks

Languages