CLI to control iOS and Android devices for AI agents influenced by Vercel’s agent-browser.
The project is in early development, considered experimental. Pull requests are welcome!
- Platforms: iOS (simulator + limited device support) and Android (emulator + device).
- Core commands:
open,back,home,app-switcher,press,long-press,focus,type,fill,scroll,scrollintoview,wait,alert,screenshot,close. - Inspection commands:
snapshot(accessibility tree). - Device tooling:
adb(Android),simctl/devicectl(iOS via Xcode). - Minimal dependencies; TypeScript executed directly on Node 22+ (no build step).
npm install -g agent-deviceOr use it without installing:
npx agent-device open SampleAppagent-device <command> [args] [--json]Examples:
agent-device open SampleApp
agent-device snapshot
agent-device snapshot -s @e7
agent-device click @e7
agent-device wait text "Camera"
agent-device alert wait 10000
agent-device back
agent-device type "hello"
agent-device screenshot --out ./screenshot.png
agent-device close SampleAppBest practice: run snapshot immediately before interactions to avoid stale coordinates if the Simulator window moves or UI changes.
When interacting with UI elements from a snapshot, prefer refs (e.g. click @e7) over raw coordinates. Refs are stable across runs and avoid coordinate drift.
Coordinates:
- All coordinate-based commands (
press,long-press,focus,fill) use device coordinates with origin at top-left. - X increases to the right, Y increases downward.
iOS snapshots:
- Default backend is
hybridbecause it provides the best speed vs correctness trade-off: AX is fast but can miss UI details, while XCTest is slower but more complete. Hybrid uses the fast AX snapshot first, then fills empty containers (tab bars/toolbars/groups) with scoped XCTest snapshots. axis the fast AX-only backend and requires enabling Accessibility for the terminal app in System Settings.xctestis the slower XCTest-only backend that avoids Accessibility permissions.- You can scope snapshots to a label or identifier with
-s "<label>"or to a previous ref with-s @ref. In practice, if AX returns aTab Bargroup with no children, hybrid will run a scoped XCTest snapshot forTab Barand insert those nodes under the group.
Flags:
--platform ios|android--device <name>--udid <udid>(iOS)--serial <serial>(Android)--out <path>(screenshot)--session <name>--verbosefor daemon and runner logs--jsonfor structured output--backend ax|xctest|hybrid(snapshot only; defaults tohybridon iOS)
Tracing:
trace start [path]to begin capturing AX/XCTest logs for the session.trace stop [path]to stop capture and optionally move the trace log.
Sessions:
openstarts a session. Without args boots/activates the target device/simulator without launching an app.- All interaction commands require an open session.
closestops the session and releases device resources. Pass an app to close it explicitly, or omit to just close the session.- Use
--session <name>to manage multiple sessions. - Session logs are written to
~/.agent-device/sessions/<session>-<timestamp>.ad.
Snapshot defaults to the hybrid backend on iOS simulators. Use --backend ax for AX-only or --backend xctest for XCTest-only.
Find (semantic):
find <text> <action> [value]finds by any text (label/value/identifier) using a scoped snapshot.find text|label|value|role|id <value> <action> [value]for specific locators.- Actions:
click(default),fill,type,focus,get text,get attrs,wait [timeout],exists.
Settings helpers (simulators):
settings wifi on|offsettings airplane on|offsettings location on|off(iOS uses per‑app permission for the current session app)- Note: iOS wifi/airplane toggles status bar indicators, not actual network state. Airplane off clears status bar overrides.
App state:
appstateshows the foreground app/activity (Android). On iOS it uses the current session app when available, otherwise it falls back to a snapshot-based guess (AX first, XCTest if AX can’t identify).apps --metadatareturns app list with minimal metadata.
- Start trace capture before a flaky sequence:
agent-device trace startagent-device trace stop ./trace.log
- The trace log includes AX snapshot stderr and XCTest runner logs for the session.
- Built-in retries cover transient runner connection failures, AX snapshot hiccups, and Android UI dumps.
- For snapshot issues, compare
--backend axvs--backend xctestand scope with-s "<label>".
- Bundle/package identifiers are accepted directly (e.g.,
com.apple.Preferences). - Human-readable names are resolved when possible (e.g.,
Settings). - Built-in aliases include
Settingsfor both platforms.
- Input commands (
press,type,scroll, etc.) are supported only on simulators in v1 and use the XCTest runner. alertandscrollintoviewuse the XCTest runner and are simulator-only in v1.- Real device support (including snapshots) is on the roadmap for iOS.
pnpm testpnpm buildEnvironment selectors:
ANDROID_DEVICE=Pixel_9_Pro_XLorANDROID_SERIAL=emulator-5554IOS_DEVICE="iPhone 17 Pro"orIOS_UDID=<udid>
Test screenshots are written to:
test/screenshots/android-settings.pngtest/screenshots/ios-settings.png
See CONTRIBUTING.md.
agent-device is an open source project and will always remain free to use. Callstack is a group of React and React Native geeks. Contact us at hello@callstack.com if you need any help with these technologies or just want to say hi.