Re-inventing the wheel with style
Llama Compose is a Kotlin Multiplatform (KMP) application showcasing private, on-device LLM inference with llama.cpp, an agent with tool calling, and a modern Compose Multiplatform UI.
It targets Android, iOS, Desktop via shared code with platform-specific integrations.
It is the topic of my talk on Colombia AI Week 2025.
- Overview
- Highlights
- Download
- Quick Start
- Project Structure (single repo)
- Platform Notes
- Vulkan backend patch (optional)
- Attributions & Licenses
- Local LLM inference using llama.cpp
- Agent mode with tool calling (notes, calendar, tasks, contacts, places, time) powered by JetBrains' Koog.ai
- Simple chat mode for direct model interaction
- Compose Multiplatform UI, Koin DI, Ktor client
Llama Compose is available for Apple, Android & Windows.
Windows store coming soon.
- JDK 17+
- Git (with submodules)
- Gradle (wrapper is provided)
- Platform SDKs as needed:
- Android Studio (SDK/NDK)
- Xcode (for iOS/macOS)
- Desktop: Java/Compose runtime (handled by Gradle)
- Recommended tools by platform:
- macOS: Homebrew,
ninja,libomp(for OpenMP), Metal-capable device for GPU - Windows: MSYS2/MinGW toolchain for native builds (see Windows notes)
- Linux: CMake toolchain; Vulkan SDK for GPU (optional)
- macOS: Homebrew,
git clone https://github.com/DmyMi/llama-compose
cd llama-compose
git submodule update --init --recursive- Use GGUF models compatible with llama.cpp.
- Option 1: Use in-app model download/selection UI.
- Option 2: Modify default model object with your models.
The default models are defined in
composeApp/src/commonMain/kotlin/cloud/dmytrominochkin/ai/llamacompose/download/Models.kt:
Available Models:
-
Llama 3 Groq 8B Tool Use (4.9 GiB) - Optimized for tool calling and function execution
- Provider:
bartowski - Hugging Face: bartowski/Llama-3-Groq-8B-Tool-Use-GGUF
- Provider:
-
Gemma 3n E4B IT (4.5 GiB) - Google's 4B parameter instruction-tuned model
- Provider:
unsloth - Hugging Face: unsloth/gemma-3n-E4B-it-GGUF
- Provider:
-
Gemma 3n E2B IT (3.3 GiB) - Lightweight 2B parameter model for mobile/edge
- Provider:
unsloth - Hugging Face: unsloth/gemma-3n-E2B-it-GGUF
- Provider:
-
Llama 3.2 3B Instruct (2.3 GiB) - Meta's compact instruction-tuned model
- Provider:
unsloth - Hugging Face: unsloth/Llama-3.2-3B-Instruct-GGUF
- Provider:
-
Llama 3.2 1B Instruct (0.9 GiB) - Ultra-lightweight 1B parameter model
- Provider:
unsloth - Hugging Face: unsloth/Llama-3.2-1B-Instruct-GGUF
- Provider:
-
Gemma 3 270m IT (0.5 GiB) - Minimal 270M parameter model for testing
- Provider:
unsloth - Hugging Face: unsloth/gemma-3-270m-it-GGUF
- Provider:
All models are quantized GGUF format from Hugging Face, optimized for different use cases from desktop to mobile deployment.
- Open in Android Studio and run the
composeAppAndroid target. - Or via CLI:
./gradlew :composeApp:installDebug
adb shell am start -n cloud.dmytrominochkin.ai.llamacompose.MainActivityEverything will be handled automatically
- Open
iosApp/in Xcode and run on simulator/device.
Everything will be handled automatically
./gradlew :composeApp:run- Optional: If native llama.cpp &
llamasubproject needs to be built for desktop, run the provided Gradle tasks first. To build and copy native libraries for desktop platforms, run:
./gradlew :composeApp:copyNativeLibrariesToDesktopThis task automatically:
- Builds llama.cpp with platform-specific optimizations (Metal for macOS, Vulkan for Linux/Windows)
- Builds the Kotlin/Native fatllama wrapper
- Copies the fatllama shared library and dependencies to
desktopResources/ - Handles MinGW runtime DLLs on Windows
- Supports both debug and release builds
The task handles all native build dependencies internally, so you only need to run this single command.
Important
Building for different Desktop platforms requires additional setup, check Platform Notes for details.
Caution
Web is added as preliminary experimental target, it is not runnable yet
/composeApp/ # Main KMP app (UI, features, navigation)
ββ src/
ββ commonMain/ # Shared Kotlin code + resources
ββ androidMain/ # Android-specific code/resources
ββ desktopMain/ # Desktop-specific code
ββ iosMain/ # iOS-specific code
/llama/ # KMP integration with llama.cpp
ββ src/ # KMP source sets (commonMain, native, android, ios, jvm, β¦)
ββ native/ # Third-party native submodules (llama.cpp, OpenCL, Vulkan)
/iosApp/ # iOS app wrapper (Swift/SwiftUI)
/buildSrc/ # Gradle build configuration and plugins
/gradle/ # Gradle wrapper and version catalogs
build.gradle.kts, settings.gradle.kts, gradle.properties, local.properties
flowchart TB
common[commonMain]
androidMain[androidMain] --> common
desktopMain["desktopMain (JVM)"] --> common
iosMain[iosMain] --> common
iosTargets["iosX64Main | iosArm64Main | iosSimulatorArm64Main"] --> iosMain
flowchart TB
common[commonMain]
commonJvm[commonJvmMain] --> common
androidMain[androidMain] --> commonJvm
androidNativeArm64Main[androidNativeArm64Main] --> nativeMain
desktopMain[desktopMain] --> commonJvm
iosMain[iosMain] --> nativeMain
iosTargets["iosX64Main | iosArm64Main | iosSimulatorArm64Main"] --> iosMain
nativeMain[nativeMain] --> common
macosMain[macosMain] --> nativeMain
linuxX64Main[linuxX64Main] --> nativeMain
mingwX64Main[mingwX64Main] --> nativeMain
nativeInterop[nativeInterop]
- Install dependencies (if you want to enable OpenMP):
brew install ninja libompMetal GPU acceleration is automatically enabled on macOS by default. To configure Metal settings:
// In your build.gradle.kts
llamaCpp {
desktop {
enableMetal.set(true) // Enable Metal GPU acceleration (default: true on macOS)
enableBlas.set(true) // Enable BLAS for CPU optimization
macOsSharedLibs.set(false) // Build static libraries (default: false)
}
}Runtime Metal Selection:
- Metal is automatically detected and used when available
- Falls back to CPU/OpenMP if Metal is not available
- Metal library is embedded in the build (
GGML_METAL_EMBED_LIBRARY=ON) - Supports both Intel and Apple Silicon architectures
CMake Flags Applied:
GGML_METAL=ON- Enables Metal backendGGML_METAL_EMBED_LIBRARY=ON- Embeds Metal libraryCMAKE_OSX_ARCHITECTURES=arm64;x86_64- Universal binary support
- Install MSYS2 from msys2.org
- Launch MSYS2 MINGW64 terminal (blue icon)
- Install build tools:
pacman -S mingw-w64-x86_64-gcc mingw-w64-x86_64-cmake- Set environment variables (replace paths with your actual paths):
export JAVA_HOME=$(cygpath -u "C:\path\to\jdk\jdk-17")
export GRADLE_USER_HOME=$(cygpath -u "C:\path\to\.gradle")
export KONAN_DATA_DIR=$(cygpath -u "C:\path\to\.konan")
export MINGW64_BIN=$(cygpath -u "C:\msys64\mingw64\bin")
export PATH="$JAVA_HOME/bin:$PATH"Optional GPU acceleration:
- Vulkan: Install Vulkan SDK and set environment variables (replace paths with your actual paths):
export VULKAN_SDK=$(cygpath -u "C:\path\to\VulkanSDK\1.4.321.1")
export PATH="$VULKAN_SDK/bin:$PATH"- Install MingW64 Vulkan headers for linker search:
pacman -S mingw-w64-x86_64-vulkan-devel - OpenBLAS (optional, need to enable in
build.gradle.kts):pacman -S mingw-w64-x86_64-openblas
Basic setup:
# Ubuntu/Debian
sudo apt-get install -y build-essential cmake
# Optional: Vulkan SDK for GPU acceleration
wget -qO - https://packages.lunarg.com/lunarg-signing-key-pub.asc | sudo apt-key add -
sudo wget -qO /etc/apt/sources.list.d/lunarg-vulkan-noble.list https://packages.lunarg.com/vulkan/lunarg-vulkan-noble.list
sudo apt-get update -y
sudo apt-get install -y vulkan-sdk mesa-vulkan-driversNote
Vulkan is optional but recommended for GPU acceleration. CPU-only builds work without it.
Caution
Web is added as preliminary experimental target, it is not runnable yet
brew install emscriptenAs my PR was accepted to llama.cpp, no patching is needed at the moment.
- This repositoryβs license: Apache 2.0
- llama.cpp β Copyright respective authors (MIT)
- Compose Multiplatform β JetBrains (Apache 2.0)
- Koog.ai Agents framework β JetBrains (Apache 2.0)
- Koin DI β InsertKoin (Apache 2.0)
- Ktor Client β JetBrains (Apache 2.0)
- Okio β Square (Apache 2.0)
- Kotlinx Serialization β JetBrains (Apache 2.0)
- Markdown Renderer β Mike Penz (Apache 2.0)
- Kottie β Ismai117 (Apache 2.0)
- JNA β Java Native Access (Apache 2.0)
- DataStore β AndroidX (Apache 2.0)
- Wire β Square (Apache 2.0)
- Navigation Compose β AndroidX (Apache 2.0)
- Logback β QOS.ch (EPL 1.0)
- SLF4J β QOS.ch (MIT)
