Smart-AI-Bot

AI-powered Android UI test automation platform — write test cases in plain English (or import from xmind / Markdown), have an AI agent run them on real devices, and get visual replay reports. Doubles as a general phone-automation tool with self-learning replay.

English · 简体中文

Demo

demo-run-live.mp4

A test run in progress (2× speed) — the wide-screen run page: case results on the left, the live agent log (thoughts + JSON-RPC calls like tap_element / screenshot) in the middle, and the device screen mirrored live on the right (hardware H.264 decoded frame-by-frame via WebCodecs) — all updating as the AI agent drives the device end-to-end from a single plain-language test case.

No USB cable, any network. The phone talks to the server over the Portal App's reverse WebSocket — the laptop and the phone don't even need to be on the same network. Run your devices anywhere (4G / 5G / corporate WiFi).

Why Smart-AI-Bot

You write:

Open Settings, find About Phone, capture the version number.
Expected: System version is shown, no error dialog.

The AI agent finds the path, taps, verifies — every step has a screenshot and a thought trace. Failed cases automatically extract a "lesson learned"; the next time the same task runs, the agent avoids the same mistake.

No XPath, no Appium, no recorded scripts.

How it's different

Most AI phone tools stop at "control a device with an LLM." Smart-AI-Bot is the test platform around that agent — the parts QA teams need to actually run tests every day:

	Smart-AI-Bot	AI agent libs (droidrun, Mobile-Agent, AutoGLM)	Scripted frameworks (Appium, Maestro)
Plain-language test cases	✅	✅	❌ scripts / YAML
Real devices over any network, no ADB cable	✅ reverse-WebSocket Portal	⚠️ usually ADB / USB	❌ local / USB
Test-management UI (suites, run history, run compare, pass-rate trend)	✅	❌ library only	⚠️ partial (paid cloud)
Learns from its own failures (guardrails re-injected)	✅	❌	❌
Self-contained shareable HTML reports	✅	❌	⚠️
Import from xmind / Excel / Markdown	✅	❌	❌

Honest trade-offs: it's Android-only today (Midscene / Appium also cover iOS & web), and pure-vision robustness on Canvas/game UIs still trails Midscene's Set-of-Marks. Full breakdown → Comparison.

Features

Plain-language test cases — write in Chinese or English; import from YAML / Excel / xmind / Markdown
Dual perception — screenshot (vision) + a11y tree (semantic), fused decision
Multi-LLM — OpenAI, Anthropic, Gemini, Zhipu GLM, Groq, Ollama
Any-network device — Portal App opens a reverse WebSocket; runs over 4G / 5G / corporate WiFi without ADB
Test management UI — suites, cases, run history, step replay, run comparison, pass-rate trend
Self-contained HTML reports — single-file export with screenshots, thoughts, actions, verdicts
Planner + Subagent — complex tasks decomposed into subgoals, each with isolated context
Page-aware reasoning — current Activity class + recent-pages trail injected, so the agent recognizes "wrong screen" instead of blindly tapping
Two-shot verifier — at-action frame (catches transient toasts) + settled frame, both used for pass/fail judgment
Learn from mistakes — LessonLearned auto-extracted from past runs and re-injected as guardrails
Auto-recovery — 4-level escalation when stuck (warn → back → restart → fail)
Observability — token usage, perception/LLM/action timing per step, pass-rate trend chart
CI/CD — CLI runner, webhook notifications (Feishu / DingTalk / Slack)

Full comparison and roadmap: Comparison · Roadmap

Screenshots


Devices & live screen — pair by QR or token, then watch the device mirrored live (ADB H.264 decoded via WebCodecs)	Run page — case results, the live agent log, and the device screen / recording replay alongside the run

Test Report — pass/fail counts, pass rate, token usage, run time, and per-case verdict with verifier reasoning	Step Replay — every action with screenshot, agent reasoning, and tool call (e.g. `tap_element({"index": 5})`)

The exported HTML report is fully self-contained, and its step replay can auto-play (2× speed):

demo-report-replay.mp4

Quick Start

New here? The Getting Started guide walks you from zero to your first test run with screenshots. The steps below are the condensed version.

Option 1 — Docker (recommended for deployment)

Requires Docker 20+ with Compose v2. No Python / Node install needed on the host.

git clone https://github.com/rejigtian/Smart-AI-Bot.git
cd Smart-AI-Bot
docker compose up -d

Open http://localhost:5173 and drop your LLM API keys into Settings.

The SQLite database is persisted in a Docker volume (backend-data). To override ports:

BACKEND_PORT=18000 FRONTEND_PORT=15173 docker compose up -d

Option 2 — Run from source

Prerequisites: Python 3.9+, Node.js 18+, an Android device (real or emulator).

git clone https://github.com/rejigtian/Smart-AI-Bot.git
cd Smart-AI-Bot

# Backend
cd backend
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
uvicorn main:app --reload --host 0.0.0.0 --port 8000

# Frontend (new terminal)
cd frontend
npm install
npm run dev

Or one command:

./start.sh

Open http://localhost:5173 and drop your LLM API keys into Settings.

Install the Portal App

Option A — scan to install (easiest)

With the backend running, open the Web UI's Devices page in a phone browser, tap 📱 Download App, and scan the QR to download and install the latest SmartAgent-<version>.apk. Allow "install from unknown sources" when prompted.

Option B — build from source

cd android
./gradlew assembleDebug   # also archived as backend/data/apk/SmartAgent-<version>.apk
adb install -r app/build/outputs/apk/debug/app-debug.apk

First launch — pair the device

Easiest — scan to connect. In the Devices page, generate a token and tap Show QR. In the Portal app tap 扫码连接 (Scan QR) and scan it — the server URL + token are filled in and it connects in one tap.

Manual. Set the Server WebSocket URL and Device Token by hand, then tap Connect.

Finally: System Settings → Accessibility → enable AgentAccessibilityService. The persistent foreground notification means you're online.

Which address? A real phone can't reach localhost — that only works for an emulator running on the same computer. On the same LAN, open the Web UI by the machine's internal IP (e.g. http://192.168.1.10:5173); the QR and pairing address then default to that internal address automatically. To use a public address or domain, configure it manually — see Deployment.

Write a test case

In the Test Suites page, create a suite and add a case:

Path: Open Settings, navigate to About Phone, capture the version number
Expected: System version info is shown, no error dialog

Pick a device + model, hit Run.

CLI (CI/CD integration)

cd backend
python cli.py run --suite <id> --device <id> --json

Exit code: 0 = all passed, 1 = at least one failed.

Architecture

Browser (management UI)
  │ REST + SSE
FastAPI server
  ├── Planner (decomposes complex tasks)
  │     └── SubAgent #1..N (isolated context per subgoal)
  ├── TestCaseAgent (6-layer + VLM fallback)
  │     perception → decision → action → memory → verification → replay
  └── SQLite + webhook + CLI
        Device / Suite / Case / Run / Result / StepLog
  │
  │ WebSocket JSON-RPC
Android device (Portal App)
  tap / swipe / input / screenshot / get_ui_state

Detailed design: docs/agent-architecture.md.

More Docs

Doc	What it covers
Getting Started	Zero-to-first-test walkthrough — connect a device, write & run a case
Deployment	Docker, public-server (HTTPS/WSS) setup, backups
Agent Architecture	6-layer agent + Planner / Subagent design
Android Portal	Portal App performance & connection stability
Test KB	Building the test knowledge base for your own app
Roadmap	Done features + priorities
Comparison	DroidRun / Midscene / AutoGLM technical comparison
Troubleshooting	Common issues — connection / screenshot / recognition
Changelog	Release history — what changed in each version

Acknowledgments

This project is inspired by:

droidrun / droidrun-portal — the Portal App's reverse WebSocket and connection-stability patterns (library-level ping/pong, reconnect budget, terminal-error detection) are directly inspired by droidrun-portal.
Midscene.js — the Set-of-Marks visual annotation idea inspired our a11y element overlay. We ended up using magenta crosshairs instead of numbered bubbles to avoid confusion with in-game content.
AutoGLM — the Planner / Grounder split influenced our dual-perception fusion architecture.

Contributing

PRs and issues welcome. Common contribution paths:

New LLM provider — add a branch in agent/base.py
New Portal App action — define the tool in agent/tools.py + implement it in ws_device.py
New test case format parser — core/test_parser.py
Documentation / i18n

License

MIT — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 98 Commits
.github		.github
android		android
assets		assets
backend		backend
docs		docs
frontend		frontend
test_knowledge		test_knowledge
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
README.zh-CN.md		README.zh-CN.md
docker-compose.yml		docker-compose.yml
start.sh		start.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Smart-AI-Bot

Demo

Table of Contents

Why Smart-AI-Bot

How it's different

Features

Screenshots

Quick Start

Option 1 — Docker (recommended for deployment)

Option 2 — Run from source

Install the Portal App

First launch — pair the device

Write a test case

CLI (CI/CD integration)

Architecture

More Docs

Acknowledgments

Contributing

License

About

Uh oh!

Releases 5

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Smart-AI-Bot

Demo

Table of Contents

Why Smart-AI-Bot

How it's different

Features

Screenshots

Quick Start

Option 1 — Docker (recommended for deployment)

Option 2 — Run from source

Install the Portal App

First launch — pair the device

Write a test case

CLI (CI/CD integration)

Architecture

More Docs

Acknowledgments

Contributing

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 5

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages