Privacy-first, on-device AI SDKs that bring powerful language models directly to your iOS and Android applications. RunAnywhere enables intelligent AI execution with automatic optimization for performance, privacy, and user experience.
The iOS SDK provides high-performance on-device text generation, complete voice AI pipeline with VAD/STT/LLM/TTS, structured outputs with type-safe JSON generation, and thinking model support for privacy-first AI applications. View iOS SDK β
The Android Kotlin Multiplatform SDK provides high-performance on-device text generation with streaming support, comprehensive model management, structured outputs with JSON generation, and thinking model support for privacy-first AI applications. View Android SDK β
- iOS SDK - Swift Package with comprehensive on-device AI capabilities
- iOS Demo App - Full-featured sample app showcasing all SDK features
- Android SDK - Kotlin Multiplatform SDK with JVM and Android targets
- Android Demo App - Full-featured sample app showcasing text generation
- π¬ Text Generation - High-performance on-device text generation with streaming support
- ποΈ Voice AI Pipeline - Complete voice workflow with VAD, STT, LLM, and TTS components
- π Structured Outputs - Type-safe JSON generation with schema validation using
Generatableprotocol - π§ Thinking Models - Support for models with thinking tags (
<think>...</think>) - ποΈ Model Management - Automatic model discovery, downloading, and lifecycle management
- π Performance Analytics - Real-time metrics with comprehensive event system
- π― Intelligent Routing - Automatic on-device vs cloud decision making
- π Privacy-First - All processing happens on-device by default with intelligent cloud routing
- π Multi-Framework - GGUF (llama.cpp), Apple Foundation Models, WhisperKit, Core ML, MLX, TensorFlow Lite
- β‘ Native Performance - Optimized for Apple Silicon with Metal and Neural Engine acceleration
- π§ Smart Memory - Automatic memory optimization, cleanup, and pressure handling
- π± Cross-Platform - iOS 16.0+, macOS 12.0+, tvOS 14.0+, watchOS 7.0+
- ποΈ Component Architecture - Modular components for flexible AI pipeline construction
- π¬ Text Generation - High-performance on-device text generation with streaming support via Kotlin Flow
- π Structured Outputs - Type-safe JSON generation with schema validation
- π§ Thinking Models - Support for models with thinking tags (
<think>...</think>) - ποΈ Model Management - Automatic model discovery, downloading with progress tracking, and lifecycle management
- π Performance Analytics - Real-time metrics with comprehensive event system
- π Device Registration - Lazy device registration with automatic retry logic
- π Privacy-First - All processing happens on-device by default
- π GGUF Support - llama.cpp integration for quantized models (GGUF/GGML)
- β‘ Native Performance - JNI-based native integration for optimal performance
- π Kotlin Flow - Modern reactive streams for streaming generation
- π± Cross-Platform - Android 7.0+ (API 24+), JVM desktop applications
- ποΈ Component Architecture - Modular LLM components with provider pattern
- β SHA-256 Verification - Automatic model integrity checking on download
- Android SDK - Full parity with iOS features
- Hybrid Routing - Intelligent on-device + cloud execution
- Advanced Analytics - Usage insights and performance dashboards
- Remote Configuration - Dynamic model and routing updates
- Enterprise Features - Team management and usage controls
- Extended Model Support - ONNX, TensorFlow Lite, Core ML optimizations
- Multi-Modal Support - Image and audio understanding
import RunAnywhere
import LLMSwift
import WhisperKitTranscription
// 1. Initialize the SDK
try await RunAnywhere.initialize(
apiKey: "dev", // Any string works in dev mode
baseURL: "localhost", // Not used in dev mode
environment: .development
)
// 2. Register framework adapters
await LLMSwiftServiceProvider.register()
let options = AdapterRegistrationOptions(
validateModels: false,
autoDownloadInDev: false,
showProgress: true
)
try await RunAnywhere.registerFrameworkAdapter(
LLMSwiftAdapter(),
models: [
try! ModelRegistration(
url: "https://huggingface.co/prithivMLmods/SmolLM2-360M-GGUF/resolve/main/SmolLM2-360M.Q8_0.gguf",
framework: .llamaCpp,
id: "smollm2-360m",
name: "SmolLM2 360M",
memoryRequirement: 500_000_000
)
],
options: options
)
// 3. Download and load model
try await RunAnywhere.downloadModel("smollm2-360m")
try await RunAnywhere.loadModel("smollm2-360m")
// 4. Generate text with analytics
let result = try await RunAnywhere.generate(
"Explain quantum computing in simple terms",
options: RunAnywhereGenerationOptions(
maxTokens: 100,
temperature: 0.7
)
)
print("Generated: \(result.text)")
print("Speed: \(result.performanceMetrics.tokensPerSecond) tok/s")
print("Tokens: \(result.tokensUsed)")View full iOS documentation β
import com.runanywhere.sdk.public.RunAnywhere
import com.runanywhere.sdk.llm.llamacpp.LlamaCppModule
import com.runanywhere.sdk.models.RunAnywhereGenerationOptions
import com.runanywhere.sdk.data.models.SDKEnvironment
// 1. Initialize the SDK
suspend fun initializeSDK() {
// Register LlamaCpp module for GGUF model support
LlamaCppModule.register()
// Initialize SDK
RunAnywhere.initialize(
apiKey = "dev", // Any string works in dev mode
baseURL = "https://api.runanywhere.ai",
environment = SDKEnvironment.DEVELOPMENT
)
}
// 2. Download and load model
suspend fun setupModel() {
// Download model with progress tracking
RunAnywhere.downloadModel("smollm2-360m").collect { progress ->
println("Download progress: ${(progress * 100).toInt()}%")
}
// Load model
val success = RunAnywhere.loadModel("smollm2-360m")
if (success) {
println("Model loaded successfully")
}
}
// 3. Generate text (non-streaming)
suspend fun generateText() {
val result = RunAnywhere.generate(
prompt = "Explain quantum computing in simple terms",
options = RunAnywhereGenerationOptions(
maxTokens = 100,
temperature = 0.7f
)
)
println("Generated: $result")
}
// 4. Generate text with streaming
suspend fun streamText() {
RunAnywhere.generateStream(
prompt = "Explain quantum computing in simple terms",
options = RunAnywhereGenerationOptions(
maxTokens = 100,
temperature = 0.7f
)
).collect { token ->
print(token) // Print each token as it arrives
}
}
// 5. Get current model info
val currentModel = RunAnywhere.currentModel
println("Current model: ${currentModel?.name}")
// 6. Unload model when done
suspend fun cleanup() {
RunAnywhere.unloadModel()
}View full Android documentation β
- Platforms: iOS 16.0+ / macOS 12.0+ / tvOS 14.0+ / watchOS 7.0+
- Development: Xcode 15.0+, Swift 5.9+
- Recommended: iOS 17.0+ for full feature support
- Minimum SDK: 24 (Android 7.0)
- Target SDK: 36
- Kotlin: 2.1.21+
- Gradle: 8.11.1+
- Java: 17
Add RunAnywhere to your project:
- In Xcode, select File > Add Package Dependencies
- Enter the repository URL:
https://github.com/RunanywhereAI/runanywhere-sdks - Select version rule:
- Latest Release (Recommended): Choose Up to Next Major from
0.15.2 - Specific Version: Choose Exact and enter
0.15.2 - Development Branch: Choose Branch and enter
main
- Latest Release (Recommended): Choose Up to Next Major from
- Select products based on your needs:
RunAnywhere- Core SDK (required)LLMSwift- GGUF/GGML models via llama.cpp (optional, iOS 16+)WhisperKitTranscription- Speech-to-text (optional, iOS 16+)FluidAudioDiarization- Speaker diarization (optional, iOS 17+)
- Click Add Package
dependencies: [
.package(url: "https://github.com/RunanywhereAI/runanywhere-sdks", from: "0.15.7")
],
targets: [
.target(
name: "YourApp",
dependencies: [
.product(name: "RunAnywhere", package: "runanywhere-sdks"),
.product(name: "LLMSwift", package: "runanywhere-sdks"),
.product(name: "WhisperKitTranscription", package: "runanywhere-sdks")
]
)
]Latest Release (Recommended):
dependencies {
implementation("com.runanywhere.sdk:RunAnywhereKotlinSDK-android:0.1.0")
// LlamaCpp module for GGUF model support
implementation("com.runanywhere.sdk:runanywhere-llm-llamacpp-android:0.1.0")
}JVM Target (for IntelliJ plugins, desktop apps):
dependencies {
implementation("com.runanywhere.sdk:RunAnywhereKotlinSDK-jvm:0.1.0")
// LlamaCpp module for GGUF model support
implementation("com.runanywhere.sdk:runanywhere-llm-llamacpp-jvm:0.1.0")
}dependencies {
implementation 'com.runanywhere.sdk:RunAnywhereKotlinSDK-android:0.1.0'
implementation 'com.runanywhere.sdk:runanywhere-llm-llamacpp-android:0.1.0'
}<dependencies>
<dependency>
<groupId>com.runanywhere.sdk</groupId>
<artifactId>RunAnywhereKotlinSDK-jvm</artifactId>
<version>0.1.0</version>
</dependency>
<dependency>
<groupId>com.runanywhere.sdk</groupId>
<artifactId>runanywhere-llm-llamacpp-jvm</artifactId>
<version>0.1.0</version>
</dependency>
</dependencies># Build and publish to local Maven repository
cd sdk/runanywhere-kotlin
./scripts/sdk.sh publish
# Then in your app's build.gradle.kts:
repositories {
mavenLocal()
}// All processing stays on-device with analytics
let result = try await RunAnywhere.generate(
userMessage,
options: RunAnywhereGenerationOptions(maxTokens: 150)
)
print("Response: \(result.text)")
print("Speed: \(result.performanceMetrics.tokensPerSecond) tok/s")// Voice pipeline with VAD, STT, LLM, TTS
let config = ModularPipelineConfig(
components: [.vad, .stt, .llm, .tts],
stt: VoiceSTTConfig(modelId: "whisper-base"),
llm: VoiceLLMConfig(modelId: "default", maxTokens: 100)
)
let pipeline = try await RunAnywhere.createVoicePipeline(config: config)
for try await event in pipeline.process(audioStream: audioStream) {
// Handle voice events
}// Type-safe JSON generation with Generatable protocol
struct Quiz: Codable, Generatable {
let title: String
let questions: [Question]
static var jsonSchema: String {
return """
{
"type": "object",
"properties": {
"title": {"type": "string"},
"questions": {"type": "array"}
}
}
"""
}
}
let quiz = try await RunAnywhere.generateStructured(
Quiz.self,
prompt: "Create a quiz about Swift programming",
options: options
)All generation methods return comprehensive analytics:
let result = try await RunAnywhere.generate(prompt, options: options)
// Access performance metrics
print("Speed: \(result.performanceMetrics.tokensPerSecond) tok/s")
print("First token: \(result.performanceMetrics.timeToFirstTokenMs ?? 0)ms")
print("Total time: \(result.latencyMs)ms")
print("Memory: \(result.memoryUsed / 1024 / 1024)MB")
// For thinking models (models that support <think> tags)
if let thinkingTokens = result.thinkingTokens {
print("Thinking tokens: \(thinkingTokens)")
print("Response tokens: \(result.responseTokens)")
}Streaming returns both real-time tokens and final analytics:
let streamResult = try await RunAnywhere.generateStream(prompt, options: options)
// Display tokens in real-time
for try await token in streamResult.stream {
print(token, terminator: "")
}
// Get complete analytics after streaming finishes
let metrics = try await streamResult.result.value
print("\nSpeed: \(metrics.performanceMetrics.tokensPerSecond) tok/s")
print("Total tokens: \(metrics.tokensUsed)")// Download with progress tracking
let progressStream = try await RunAnywhere.downloadModelWithProgress("model-id")
for try await progress in progressStream {
print("Progress: \(Int(progress.percentage * 100))%")
}
// Load and unload models
try await RunAnywhere.loadModel("model-id")
try await RunAnywhere.unloadModel()
// List available models
let models = try await RunAnywhere.listAvailableModels()
// Check current model
if let current = RunAnywhere.currentModel {
print("Currently loaded: \(current.name)")
}let count = RunAnywhere.estimateTokenCount("Your prompt here")
print("Estimated: \(count) tokens")
// Check if prompt fits in context window
if count + maxTokens > 4096 {
print("Warning: May exceed context limit")
}- iOS SDK Documentation - Complete API reference and guides
- iOS Sample App - Full-featured demo application
- Architecture Overview - Technical deep dive
- Android SDK Documentation - Complete API reference and guides
- Android Sample App - Full-featured demo application
- Kotlin SDK Architecture - Technical deep dive
We welcome contributions from the community! Here's how you can help:
- π Report bugs - Help us identify and fix issues
- π‘ Suggest features - Share your ideas for improvements
- π Improve documentation - Help make our docs clearer
- π§ Submit pull requests - Contribute code directly
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
See our Contributing Guidelines for detailed instructions.
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
This project includes code from third-party open source projects. See THIRD_PARTY_LICENSES.md for the complete list of third-party licenses and acknowledgments, including:
- llama.cpp (MIT License) - GGUF model support
- MLC-LLM (Apache License 2.0) - Universal LLM deployment engine
- Discord: Join our community
- GitHub Issues: Report bugs or request features
- Email: founders@runanywhere.ai
- Twitter: @RunanywhereAI
Built with β€οΈ by the RunAnywhere team. Special thanks to:
- The open-source community for inspiring this project
- Our early adopters and beta testers
- Contributors who help make this SDK better
Ready to build privacy-first AI apps? Get started with our iOS SDK β



