A simple Android chat application demonstrating the RunAnywhere SDK for on-device AI inference.
This is a minimal example showing how to:
- Initialize the RunAnywhere SDK
- Download AI models (LLMs)
- Load models into memory
- Run text generation with streaming responses
- Model Management: Download and load AI models directly in the app
- Real-time Streaming: See AI responses generate word-by-word
- Simple UI: Clean Jetpack Compose interface
- On-Device AI: All inference runs locally on your Android device
./gradlew assembleDebug
# Or open in Android Studio and click Run- Launch the app
- Tap "Models" in the top bar
- Choose a model (we recommend starting with "SmolLM2 360M Q8_0" - only 119 MB)
- Tap "Download" and wait for it to complete
- Once downloaded, tap "Load" on the model
- Wait for "Model loaded! Ready to chat." message
- Type a message in the text field
- Tap "Send"
- Watch the AI response generate in real-time
The app comes pre-configured with two models:
| Model | Size | Quality | Best For |
|---|---|---|---|
| SmolLM2 360M Q8_0 | 119 MB | Basic | Testing, quick responses |
| Qwen 2.5 0.5B Instruct Q6_K | 374 MB | Better | General conversations |
- RunAnywhere Core SDK: Component architecture and model management
- LlamaCpp Module: Optimized llama.cpp inference engine with 7 ARM64 variants
- Kotlin Coroutines: For async operations and streaming
MyApplication (initialization)
↓
ChatViewModel (state management)
↓
ChatScreen (UI layer)
MyApplication.kt- SDK initialization and model registrationChatViewModel.kt- Business logic and state managementMainActivity.kt- UI components and composables
- Android 7.0 (API 24) or higher
- ~200 MB free storage (for smallest model)
- Internet connection (for downloading models)
- Wait a few seconds for SDK initialization
- Tap "Refresh" in the Models section
- Check logcat for initialization errors
- Check internet connection
- Ensure sufficient storage space
- Verify INTERNET permission in AndroidManifest.xml
- Try the smaller model (SmolLM2 360M)
- Close other apps to free memory
- Check that
largeHeap="true"is set in AndroidManifest.xml
- This is normal for on-device inference
- Smaller models run faster
- Performance depends on device CPU
Want to customize this app? Try:
- Add more models - Edit
MyApplication.kt→registerModels() - Customize UI - Edit
MainActivity.ktcompose functions - Add system prompts - Modify message format in
ChatViewModel.kt - Persist chat history - Add Room database or DataStore
- Add model parameters - Explore temperature, top-k, top-p settings
This example app follows the license of the RunAnywhere SDK.