- Release the SAPA-Bench.
- Release the latest evaluation code.
- ···
Smartphones offer great convenience but also collect vast amounts of personal information.
With the rise of MLLM-powered smartphone agents, automation performance has improved significantly—yet at the cost of extensive access to sensitive user data.
To systematically evaluate this issue, we introduce the first large-scale benchmark (7,138 scenarios) for privacy awareness in smartphone agents. Each scenario is annotated with:
- 🔑 Privacy Type (e.g., Account Credentials)
⚠️ Sensitivity Level- 📍 Location
We benchmarked seven mainstream smartphone agents and found:
- Overall privacy awareness (RA) remains below 60%, even with explicit hints.
- Closed-source agents generally perform better; Gemini 2.0-flash achieved the highest RA (67%).
- Privacy detection strongly correlates with sensitivity level—higher sensitivity makes scenarios more identifiable.
👉 Our results highlight the urgent need to rethink the utility–privacy tradeoff in the design of smartphone agents.
If you find SAPA-Bench useful for your research and applications, please cite using this BibTeX:
@article{lin2025sapa,
title = {Mind the Third Eye! Benchmarking Privacy Awareness in MLLM-powered Smartphone Agents},
author = {Lin, Zhixin and Li, Jungang and Pan, Shidong and Shi, Yibo and Yao, Yue and Xu, Dongliang},
journal = {arXiv preprint arXiv:2508.19493},
year = {2025}
}