AI-powered Android automation framework for React Native/Expo with SENSE → THINK → ACT loop.
QUE Mobile SDK enables AI-powered mobile app control through natural language. The system captures screen state via Android Accessibility Services, processes it through an LLM (Gemini), and executes actions programmatically.
- 🤖 AI-Powered Automation: Control Android apps with natural language commands
- 🔄 SENSE → THINK → ACT Loop: Autonomous multi-step task execution
- ⚛️ React Native/Expo: Clean TypeScript API with React hooks
- 🎯 19+ Action Types: Tap, type, swipe, voice, file operations, and more
- 🗣️ Voice Integration: Built-in text-to-speech and speech-to-text
- 📁 File System: Sandboxed storage for task tracking and results
- ⏰ Trigger System: Scheduled and notification-based automated task execution
- 🐛 Debug Mode: Visual feedback and overlay for development
- 🔌 Expo Plugin: Zero-config setup for managed workflow
npm install que-mobile-sdkOr with yarn:
yarn add que-mobile-sdkEnsure you have these installed:
npm install expo react react-nativeAdd the plugin to your app.json:
{
"expo": {
"plugins": [
"que-mobile-sdk"
]
}
}This automatically configures Android permissions and Accessibility Service.
npx expo prebuild --clean
npx expo run:androidOn your Android device:
- Go to Settings → Accessibility
- Find QUE Accessibility Service
- Enable it
import { useAgent } from 'que-mobile-sdk';
function App() {
const { execute, isRunning, result } = useAgent({
apiKey: 'YOUR_GEMINI_API_KEY',
maxSteps: 100,
});
const handleTask = async () => {
await execute("Open Instagram and like the first post");
};
return (
<View>
<Button
title={isRunning ? "Running..." : "Execute Task"}
onPress={handleTask}
disabled={isRunning}
/>
{result && <Text>{result.message}</Text>}
</View>
);
}Main hook for executing AI-powered tasks.
Parameters:
config.apiKey(string, required): Gemini API keyconfig.maxSteps(number, optional): Maximum steps per task (default: 100)config.maxFailures(number, optional): Maximum consecutive failures (default: 3)config.debugMode(boolean, optional): Enable debug visualization (default: false)config.onStep(function, optional): Callback for each stepconfig.onComplete(function, optional): Callback on task completion
Returns:
execute(task: string): Execute a taskisRunning(boolean): Whether agent is currently runningresult(AgentResult | null): Task resulterror(string | null): Error message if failedstop(): Stop executionhistory(AgentHistory[]): Execution history
Example:
const { execute, isRunning, result, error, stop } = useAgent({
apiKey: process.env.GEMINI_API_KEY,
maxSteps: 50,
debugMode: true,
onStep: (step) => console.log(`Step ${step.stepNumber}:`, step.action),
});
// Execute task
await execute("Search for 'React Native' on Google");
// Stop if needed
if (isRunning) {
stop();
}Hook for voice interactions.
Returns:
speak(text: string): Convert text to speechstartListening(): Start speech recognitionstopListening(): Stop speech recognitionisListening(boolean): Whether currently listeningisSpeaking(boolean): Whether currently speaking
Example:
const { speak, startListening, isListening } = useVoice();
// Speak
await speak("Hello, how can I help you?");
// Listen
const userInput = await startListening();
console.log("User said:", userInput);Hook for file operations.
Returns:
writeFile(name: string, content: string): Write filereadFile(name: string): Read filelistFiles(): List all files
Example:
const { writeFile, readFile, listFiles } = useFileSystem();
// Write
await writeFile("notes.md", "# My Notes\n\nTask completed!");
// Read
const content = await readFile("notes.md");
// List
const files = await listFiles();Context provider for sharing agent configuration.
<AgentProvider config={{ apiKey: 'YOUR_KEY' }}>
<App />
</AgentProvider>Quick action button component.
<AgentButton
task="Open Settings"
onComplete={(result) => console.log(result)}
/>Visual debug overlay (only renders when debugMode: true).
<DebugOverlay
step={5}
maxSteps={100}
lastAction="tap_element"
elementCount={42}
reasoning="Tapping on the search button..."
/>The SDK supports 19+ action types:
| Action | Description | Example |
|---|---|---|
tap_element |
Tap on UI element | { type: 'tap_element', elementId: 5 } |
long_press_element |
Long press element | { type: 'long_press_element', elementId: 5 } |
type |
Type text | { type: 'type', text: 'Hello' } |
swipe_down |
Swipe down | { type: 'swipe_down', amount: 500 } |
swipe_up |
Swipe up | { type: 'swipe_up', amount: 500 } |
back |
Press back button | { type: 'back' } |
home |
Press home button | { type: 'home' } |
switch_app |
Open app switcher | { type: 'switch_app' } |
wait |
Wait 1 second | { type: 'wait' } |
open_app |
Open app by name | { type: 'open_app', appName: 'Instagram' } |
search_google |
Search on Google | { type: 'search_google', query: 'React Native' } |
speak |
Text-to-speech | { type: 'speak', message: 'Hello' } |
ask |
Ask user via voice | { type: 'ask', question: 'What is your name?' } |
write_file |
Write file | { type: 'write_file', fileName: 'notes.md', content: '...' } |
append_file |
Append to file | { type: 'append_file', fileName: 'notes.md', content: '...' } |
read_file |
Read file | { type: 'read_file', fileName: 'notes.md' } |
done |
Complete task | { type: 'done', success: true, text: 'Task completed!' } |
See API Documentation for complete action reference.
interface AgentConfig {
// Required
apiKey: string; // Gemini API key
// Optional
maxSteps?: number; // Max steps per task (default: 100)
maxFailures?: number; // Max consecutive failures (default: 3)
debugMode?: boolean; // Enable debug mode (default: false)
model?: string; // Gemini model (default: 'gemini-2.0-flash-exp')
// Callbacks
onStep?: (step: AgentStep) => void;
onComplete?: (result: AgentResult) => void;
onError?: (error: Error) => void;
}Enable visual feedback during development:
const { execute } = useAgent({
apiKey: 'YOUR_KEY',
debugMode: true, // Enable debug mode
});Debug mode provides:
- 🟢 Green boxes around clickable elements
- 🟡 Yellow boxes around text elements
- 🔴 Red circles at tap locations
- ⚡ White border flash on screen changes
- 📊 Debug overlay with step info
See Debug Mode Guide for details.
import { useAgent } from 'que-mobile-sdk';
function SimpleTask() {
const { execute, isRunning, result } = useAgent({
apiKey: process.env.GEMINI_API_KEY,
});
return (
<Button
title="Open Settings"
onPress={() => execute("Open Settings app")}
disabled={isRunning}
/>
);
}import { useAgent, useVoice } from 'que-mobile-sdk';
function VoiceCommand() {
const { execute } = useAgent({ apiKey: 'YOUR_KEY' });
const { startListening, speak } = useVoice();
const handleVoiceCommand = async () => {
await speak("What would you like me to do?");
const command = await startListening();
await execute(command);
};
return (
<Button title="Voice Command" onPress={handleVoiceCommand} />
);
}import { useAgent } from 'que-mobile-sdk';
function FileTask() {
const { execute } = useAgent({ apiKey: 'YOUR_KEY' });
const task = `
1. Create a file called shopping.md
2. Write "Milk, Eggs, Bread" to it
3. Read the file back to confirm
`;
return (
<Button title="File Task" onPress={() => execute(task)} />
);
}const task = `
1. Open Instagram
2. Search for "reactnative"
3. Like the first 3 posts
4. Return to home screen
`;
await execute(task);import { useTriggers, TriggerManager } from 'que-mobile-sdk';
function ScheduledTask() {
const { createTrigger } = useTriggers();
const handleCreateTrigger = async () => {
await createTrigger({
id: TriggerManager.generateTriggerId(),
type: 'schedule',
enabled: true,
priority: 10,
task: 'Open Calendar and speak today\'s events',
agentConfig: { apiKey: 'YOUR_KEY', maxSteps: 50 },
schedule: {
time: '08:00',
daysOfWeek: [1, 2, 3, 4, 5], // Weekdays
},
});
};
return (
<Button title="Schedule Morning Briefing" onPress={handleCreateTrigger} />
);
}See Usage Guides and Trigger System Guide for more examples.
Problem: Agent fails with "Accessibility Service not enabled"
Solution:
- Go to Settings → Accessibility
- Find QUE Accessibility Service
- Enable it
- Restart your app
Problem: LLM errors with "Invalid API key"
Solution:
- Get a Gemini API key from Google AI Studio
- Ensure the key is correctly set in
AgentConfig - Don't commit API keys to version control
Problem: Agent sees elements but actions fail
Solution:
- Enable debug mode to visualize what the agent sees
- Check that Accessibility Service has all required permissions
- Ensure UI elements are actually clickable (not disabled)
- Try increasing wait times between actions
Problem: Task stops with "Max steps exceeded"
Solution:
- Increase
maxStepsin config - Break complex tasks into smaller subtasks
- Check if the task is actually achievable
Problem: Agent reports "Element not found"
Solution:
- Enable debug mode to see element IDs
- Ensure the screen has loaded completely
- Add wait actions before interacting with elements
- Check if the element is scrolled off-screen
Problem: Agent is slow or times out
Solution:
- Reduce
maxStepsfor simpler tasks - Use faster Gemini model if available
- Ensure good network connection
- Close other apps to free memory
Problem: Debug overlay not showing
Solution:
- Ensure
debugMode: truein config - Check that
<DebugOverlay>component is rendered - Verify Accessibility Service is enabled
- Restart the app
Check out the example app for complete working examples:
- Simple Task: Basic task execution
- Voice Command: Voice-controlled automation
- File Operations: File system integration
- Debug Mode: Visual debugging
- Multi-Step: Complex multi-step tasks
Run the example:
cd example
npm install
npx expo prebuild
npx expo run:android┌─────────────────────────────────────────────────┐
│ React Native Application Layer │
│ (useAgent, useVoice, useFileSystem hooks) │
└─────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────┐
│ TypeScript Core Layer │
│ Agent, MemoryManager, Perception, LLM Client │
└─────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────┐
│ Native Bridge Layer (Kotlin) │
│ AccessibilityModule (NativeModule) │
└─────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────┐
│ Android Accessibility Service │
│ (UI Hierarchy, Touch, Gestures, System) │
└─────────────────────────────────────────────────┘
- SENSE: Capture screen state via Accessibility Service
- THINK: Process with Gemini LLM to decide next action
- ACT: Execute action and observe result
- Repeat until task complete or max steps reached
- Platform: Android only (iOS support planned)
- React Native: >= 0.72.0
- Expo: >= 50.0.0
- Android: >= API 24 (Android 7.0)
- Node: >= 18.0.0
Contributions are welcome! Please read our Contributing Guide for details.
MIT © Sarath Babu
- GitHub Repository
- npm Package
- Issue Tracker
- API Documentation
- Usage Guides
- Trigger System Guide
- Expo Plugin Guide
- Debug Mode Guide
- 📧 Email: sarathyadav112@gmail.com
- 🐛 Issues: GitHub Issues
- 💬 Discussions: GitHub Discussions
Made with ❤️ by the QUE team