-
Updated
Jun 25, 2026 - C++
on-device-inference
Here are 62 public repositories matching this topic...
TinyML & Edge AI: On-device inference, model quantization, embedded ML, ultra-low-power AI for microcontrollers and IoT devices.
-
Updated
Nov 10, 2025 - Python
AI for Apple silicon devices.
-
Updated
Jun 22, 2026 - Rust
Auditable offline edge intelligence for low-cost edge devices, with benchmark evidence and public board proof on ESP32-C3.
-
Updated
Mar 23, 2026 - Python
The Private Agent OS — search files, run AI agents, connect to 10,000+ tools via the complete protocol stack (MCP, AG-UI, A2UI, A2A). Zero cloud. Zero telemetry. On-device inference.
-
Updated
Jun 21, 2026 - Rust
Flutter starter example app to get started with NobodyWho, a library designed to run LLMs locally and efficiently on any device.
-
Updated
May 19, 2026 - Dart
React Native starter example app to get started with NobodyWho, a library designed to run LLMs locally and efficiently on any device.
-
Updated
May 19, 2026 - TypeScript
Custom llama.cpp fork with character intelligence engine: control vectors, attention bias, head rescaling, attention temperature, fast weight memory
-
Updated
May 16, 2026 - C++
iOS + Android app that runs local LLMs on-device + routstr cloud LLMs for anonymous inference
-
Updated
Sep 18, 2025 - TypeScript
Pythonic binding to the Apple Neural Engine
-
Updated
Jun 25, 2026 - Python
Mobile AI: iOS CoreML, Android TFLite, on-device inference, ONNX, TensorRT, and ML deployment for smartphones.
-
Updated
Nov 10, 2025 - Python
⚡️ The fastest way to run local LLMs on Apple Silicon — sub-second model loads, beats Ollama on throughput, tail latency, and full-response time. OpenAI/Ollama-compatible. No cloud, no API keys.
-
Updated
Jun 24, 2026 - Python
Curated resource for mobile teams shipping on-device LLMs — runtime benchmarks, model picks, GDPR-friendly architecture, and real production use cases.
-
Updated
Jun 22, 2026
High-performance Android SDK for on-device LLM inference (GGUF). Privacy-focused, offline-first, and powered by llama.cpp with a clean Kotlin Coroutines API.
-
Updated
Mar 27, 2026 - Kotlin
Demos and sample code for building AI-powered apps with local on-device models on Windows — using Windows AI APIs, Foundry Local, Windows ML, and WebNN. From Microsoft Build 2026.
-
Updated
Jun 23, 2026 - C#
On-device AI inference for Swift apps. Run LLMs locally on iOS, macOS, visionOS and watchOS.
-
Updated
Jun 15, 2026 - Swift
-
Updated
Jun 4, 2026 - JavaScript
Production Android AI with ExecuTorch 1.0 - Deploy PyTorch models to mobile with NPU acceleration and 50KB footprint
-
Updated
Nov 14, 2025 - Python
On-device LLM client for Android. Fork of Google AI Edge Gallery
-
Updated
May 17, 2026 - Kotlin
Unofficial Swift SDK for Google's LiteRT-LM — run Gemma 4 on-device with text, vision, audio, and tool calling. CPU + GPU (Metal). iOS 17+ / macOS 14+.
-
Updated
May 2, 2026 - Swift
Improve this page
Add a description, image, and links to the on-device-inference topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the on-device-inference topic, visit your repo's landing page and select "manage topics."