On-Device LLM + Model Context Protocol for Android
An open-source Android application that combines on-device LLM inference with MCP (Model Context Protocol) support for extensible AI tools.
Build a private, offline-capable AI assistant for Android that:
- Runs LLMs entirely on-device using llama.cpp
- Supports MCP protocol for extensible tools
- Includes built-in productivity tools (notes, calendar, etc.)
- Allows third-party MCP servers via Android IPC
The Pitch: "The only AI that knows your deepest thoughts, but tells no one."
- The Problem: People want AI insight into their mental health or personal life but are terrified of training the next GPT model with their private journals.
- The Use Case: You document anxiety, relationship struggles, or unfiltered opinions.
- The Query: "Have I been feeling more anxious lately?"
- The Magic: The app uses vector search to scan months of entries and synthesizes an answer: "You tend to express higher anxiety on Sunday nights, specifically regarding work deadlines, a pattern visible since October."
- Why it wins: Zero data exposure risk. It's a "Safe Space" in your pocket.
The Pitch: "Stop organizing. Just dump it here."
- The Problem: Notes apps are where ideas go to die because organizing them is friction.
- The Use Case: You dump raw, unstructured inputโvoice memos, screenshots, half-baked ideas, random URLsโinto the app. No folders, no tags.
- The Query: "What was that idea I had about a coffee shop app?"
- The Magic: Vector search finds the semantic match in a voice note from 3 months ago, a screenshot from last week, and a text note from today, synthesizing them into a coherent project brief.
- Why it wins: The AI is the organization layer.
The Pitch: "Bring your own AI to workโwithout getting fired by IT."
- The Problem: Employees want AI help but are banned from pasting internal docs or proprietary code into ChatGPT.
- The Use Case: You load confidential internal PDFs, strategy docs, and proprietary code snippets into the app.
- The Query: "Summarize the Q3 risks from these 5 confidential reports."
- Why it wins: It bridges the gap: AI power, zero data leak. It's "Shadow IT" that is actually secure.
See the specification documents for detailed architecture:
| Document | Description |
|---|---|
| 00_PROJECT_OVERVIEW.md | Project overview and architecture |
| 01_UI_LAYER_SPEC.md | Jetpack Compose UI layer |
| 02_AGENT_ORCHESTRATION_SPEC.md | Agent/orchestration layer |
| 03_MCP_HOST_CLIENT_SPEC.md | MCP protocol implementation |
| 04_ON_DEVICE_LLM_SPEC.md | llama.cpp integration |
| 05_BUILTIN_MCP_SERVERS_SPEC.md | Notes and other built-in tools |
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ UI Layer (Compose) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Agent/Orchestration Layer โ
โโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ MCP Host/Client โ On-Device LLM (llama.cpp)โ
โโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Built-in MCP Servers (Notes, etc.) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Different MCP servers use different transport mechanisms based on trust level and crash isolation requirements:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ AndroidMCP App โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ โ MCP Host / Client โโ
โ โ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ โ โ In-Process โ โ stdio Transport โโโ
โ โ โ Transport โ โ โโโ
โ โ โ โโโโโโโโโโโโโ โ โ โโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโ
โ โ โ โข pkb_* โ โ โ Browser MCP Server โ โโโ โ Separate process
โ โ โ โข calendar โ โ โ (WebView wrapper) โ โโโ
โ โ โ โข contacts โ โ โโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโ
โ โ โ (built-in, โ โ โโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโ
โ โ โ trusted) โ โ โ 3rd-party servers โ โโโ โ Separate process
โ โ โ โ โ โโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโ
โ โ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
| Server Type | Transport | Rationale |
|---|---|---|
| Built-in (PKB, Calendar) | In-Process | Trusted code, performance critical |
| Browser Automation | stdio | Crash isolation, memory isolation, WebView in separate process |
| 3rd-party MCP Servers | stdio / Unix Sockets | Untrusted code, MUST be isolated |
Why stdio for Browser/External servers?
- ๐ก๏ธ Crash Isolation - Browser/tool crashes don't kill the app
- ๐ง Memory Isolation - Heavy operations get separate memory budget
- ๐ Standard Protocol - Compatible with existing MCP servers
- ๐ฑ Android Constraint - WebView already runs in separate process
- Android Studio Hedgehog (2023.1.1) or later
- JDK 17 or later (bundled with Android Studio)
- Android SDK with API level 34
- Android NDK 26.1.10909125 (will be downloaded automatically)
- CMake 3.22.1 (will be downloaded automatically)
For the best development experience on macOS (especially Apple Silicon M1/M2/M3), run builds directly using Android Studio's bundled JDK:
# Set JAVA_HOME to Android Studio's bundled JDK (add to ~/.zshrc for persistence)
export JAVA_HOME="/Applications/Android Studio.app/Contents/jbr/Contents/Home"
# Verify SDK is found
echo $ANDROID_HOME # Should show ~/Library/Android/sdk
# Build debug APK
./gradlew assembleDebug
# Run unit tests (~2-3 minutes on Apple Silicon)
./gradlew testDebugUnitTest
# Build release APK
./gradlew assembleReleaseWhy local over Docker? On Apple Silicon (M1/M2/M3), Docker runs x86_64 images via QEMU emulation, making builds 5-10x slower. Native builds are significantly faster.
-
Open Project:
# Open via terminal open -a "Android Studio" /path/to/android_llm_mcp # Or: File โ Open โ Select the project folder
-
Wait for Gradle Sync:
- Android Studio will automatically sync Gradle dependencies (2-3 min first time)
- NDK and CMake will be downloaded automatically if missing
-
Build the App:
- Menu: Build โ Make Project (or
Cmd+F9) - First build takes ~7 minutes (includes native llama.cpp compilation)
- Subsequent builds are much faster (~30 seconds)
- Menu: Build โ Make Project (or
-
Run on Device/Emulator:
- Select a device from the toolbar dropdown
- Click the green
โถ๏ธ Run button (orCtrl+R) - For LLM testing, use a physical device (recommended) or ARM64 emulator
-
Run Unit Tests:
- Right-click on
app/src/testโ Run 'Tests in app' - Or: Menu Run โ Run 'All Tests'
- Expected: 740 tests passing
- Right-click on
| Issue | Solution |
|---|---|
| "SDK not found" | File โ Project Structure โ SDK Location โ Set to ~/Library/Android/sdk |
| "NDK not found" | Wait for auto-download, or: SDK Manager โ SDK Tools โ NDK |
| "CMake not found" | Wait for auto-download, or: SDK Manager โ SDK Tools โ CMake |
| Submodule errors | Run git submodule update --init --recursive in terminal |
| Gradle sync failed | File โ Invalidate Caches โ Restart |
If you prefer Docker or need a consistent CI environment:
# Build the Docker image
docker build -t android-mcp-builder .
# Run tests via Docker (slower on Apple Silicon due to emulation)
docker run --rm -v "$PWD/app:/app/app" android-mcp-builder ./gradlew testDebugUnitTest --no-daemon
# Build APK via Docker
docker run --rm -v "$PWD/app:/app/app" android-mcp-builder ./gradlew assembleDebug --no-daemonThis project uses llama.cpp (pinned to release b4380) as a Git submodule for on-device LLM inference.
# Clone with submodules
git clone --recursive https://github.com/sureshsankaran/android_llm_mcp.git
# Or if already cloned, initialize submodules
git submodule update --init --recursive
# Verify llama.cpp is at the correct version
cd app/src/main/cpp/llama.cpp
git describe --tags # Should show b4380Note: The build will fail with a clear error message if the submodule is not initialized.
# Build debug APK
./gradlew assembleDebug
# Build release APK
./gradlew assembleRelease
# Run unit tests
./gradlew test
# Run instrumented tests (requires device/emulator)
./gradlew connectedAndroidTestThe project builds llama.cpp natively for:
- arm64-v8a (ARM64, primary target)
- armeabi-v7a (ARM32, fallback)
Key build features:
- NEON SIMD optimizations enabled for ARM
- CPU-only inference (GPU backends disabled for broader compatibility)
- c++_shared STL for better compatibility
- Memory-mapped models for efficient loading
For running integration tests, place a GGUF model file in one of these locations:
# Internal storage
adb push your_model.gguf /data/data/com.androidmcp.debug/files/models/test_model.gguf
# External storage (if accessible)
adb push your_model.gguf /storage/emulated/0/Android/data/com.androidmcp.debug/files/test_model.ggufRecommended test models (Q4_K_M quantization):
- TinyLlama 1.1B (~700MB)
- Qwen2.5-0.5B (~350MB)
- UI: Jetpack Compose + Material 3
- LLM: llama.cpp via JNI
- MCP: MCP Kotlin SDK + AIDL for cross-app
- Storage: Room Database
- DI: Hilt
| Model | Size | RAM |
|---|---|---|
| Llama 3.2 1B | ~1GB | ~2GB |
| Llama 3.2 3B | ~2GB | ~4GB |
| Phi-3 Mini | ~2.5GB | ~4GB |
| Qwen2.5 3B | ~2GB | ~4GB |
Phase: Specification/Design
- Architecture design
- Layer specifications
- Project scaffolding
- Core implementation
- MVP release
| Area | Investigation | Rationale |
|---|---|---|
| Vector DB | Keep androidx.sqlite + sqlite-vec |
Current approach is solid for on-device vector search |
| LLM Runtime | Prototype MediaPipe LLM Inference | May improve NPU utilization and battery life vs llama.cpp |
| AICore / Gemini Nano | Add startup check for AICore availability | If Gemini Nano is available, use it - solves "Model Delivery" and "Battery" risks instantly (no PAD download, optimized for device) |
MIT License
- RikkaHub - Android MCP chat client
- android-mcp-sdk - Android MCP SDK
- llama.cpp - LLM inference
- MCP Kotlin SDK - Official MCP SDK