Documentation status: Verified for OpenIntelligence v4.4 on June 28, 2026. Scope: Describes shipped behavior for on-device Apple Intelligence RAG architecture.
Local-first document intelligence for macOS and iOS, featuring an entirely on-device Retrieval-Augmented Generation (RAG) pipeline and native Apple Foundation Models integration.
OpenIntelligence is an exploratory, privacy-obsessed document query assistant built natively for Apple platforms. It demonstrates that production-grade document ingestion, vector indexing, lexical retrieval, and generative AI can run entirely on device without sacrificing privacy or relying on third-party cloud wrappers.
OpenIntelligence is backed by extensive, rigorous engineering documentation detailing how reliable, hallucination-resistant on-device RAG is achieved using Apple's 4K-token local context windows.
- System Architecture: The high-level view of the decoupled import-time and query-time pipelines.
- Retrieval Pipeline (
RETRIEVAL_PIPELINE.md): Deep dive into the hybrid search engine (BM25 + Core ML Vector) and Reciprocal Rank Fusion implementation. - Ingestion Pipeline (
INGESTION_PIPELINE.md): Details of the semantic chunker, local Vision OCR fallbacks, and NLP metadata extraction. - Privacy & Routing (
PRIVACY_AND_ROUTING.md): Strict local-first data guarantees, local cache layouts, and routing protocols.
- Apple Foundation Models Specs: Optimization guide for macOS/iOS 26.x/27, managing 4K token budgets, guided generation via
@Generable, andSystemLanguageModelsessions. - Apple Document Intelligence: Practical integration with Vision OCR, SFSpeechRecognizer, PDFKit, and CoreText for semantic document parsing.
- Private Cloud Compute (PCC): Analysis of Apple's PCC enclave constraints, secure remote processing, and native execution routing layers.
- Hard Limits: A centralized reference for token boundaries, model caps, memory limitations, and platform bottlenecks.
- Current State & Gaps: Analysis of local inference latency, context packing, and model capability gaps.
- Evaluation Framework: Detailed verification procedures using
scripts/run_rag_benchmarks.pyto assert extraction accuracy and similarity scores.
The runtime operates in two decoupled phases:
flowchart TD
subgraph INGEST["Import-Time Pipeline"]
A1["Import Files"]
A2["Extract & Normalize (Vision OCR)"]
A3["Semantic Chunking"]
A4["Build FTS5 & BNNS Vector Indexes"]
A1 --> A2 --> A3 --> A4
end
subgraph QUERY["Query-Time Pipeline"]
B1["User Query"]
B2["Analyze Intent & HyDE Expansion"]
B3["Hybrid Retrieval & RRF Merge"]
B4["Cross-Encoder Reranking"]
B5["Verification Gates"]
B6["Generative LLM Response"]
B1 --> B2 --> B3 --> B4 --> B5 --> B6
end
A4 --> B3
The entire RAG architecture operates on a strict 29-Step Pipeline (6 Ingestion steps + 23 Query Loop steps). To handle complex queries, the query loop routes dynamically across three agentic modes and foundation models:
- Standard: Executes the 23-step query loop sequentially for maximum speed and battery life.
- Deep Think: Actively loops the retrieval agent through 4-10 concurrent reasoning sessions until it hits 98% confidence (scales dynamically based on device thermal state).
- Maximum: Removes the 8-session ceiling, granting the orchestrator an unlimited budget to recursively hunt down answers up to 50 loops.
- 3B Core: Offline Apple Silicon model (
SystemLanguageModel.default) executing standard query tasks. - 20B Advanced: Offline Apple Silicon model leveraging unified memory and NAND Flash Paging for enhanced reasoning.
- Private Cloud Compute (PT-MoE): Escalates over encrypted channels to Apple's 32K context secure server enclaves. Integrates native
FoundationModels.PrivateCloudComputeLanguageModelexecution when running on iOS 27 / macOS 27+, falling back cleanly to localSystemLanguageModelsimulation on older OS versions.
| Module | Core Files | Responsibility |
|---|---|---|
| Ingestion | DocumentProcessor.swift, LayoutAwareExtractor.swift |
Document content extraction, Vision OCR fallback, semantic structure recovery. |
| Chunking | SemanticChunker.swift, ContentTaggingService.swift |
Context-aware document chunking, entity resolution, NLP metadata enrichment. |
| Indexing | SQLiteFullTextService.swift, BNNSVectorDatabase.swift |
SQLite FTS5 lexical storage and local BNNS-accelerated vector indexing. |
| Retrieval | HybridSearchService.swift, ContextPackingService.swift |
BM25 + Vector hybrid merging, parent-chunk reconstruction, exact token packing. |
| Orchestration | LLMService.swift, RAGService.swift |
Execution coordination with the local SystemLanguageModel and evaluation loops. |
| Evidence Threads | EvidenceThread.swift, EvidenceThreadStore.swift |
Thread-safe local persistence of conversational research queries and verification results. |
| Diagnostics | EvidenceThreadDebugService.swift, EvidenceThreadDebugView.swift |
Developer-only view and helper service to test local persistent store integrity. |
| Shortcuts | RAGAppIntents.swift, ScreenAwarenessIntents.swift, VisualIntelligenceIntents.swift |
Siri voice integration and entity-native App Intents (16 active actions) resolving in-process via presented activeInstance binding. |
To maintain codebase transparency, please note:
- Core AI Integration: Fully integrated and registered via
CoreAISentenceEmbeddingProvider.swift. Runs zero-copy Silicon-native sentence embeddings on iOS 27+ / macOS 27+ compatible devices, automatically falling back to the standardCoreMLSentenceEmbeddingProvideron older targets. - Private Cloud Compute (PCC): Routed locally using a fallback system language model wrapper in
EngineSDKCompatibility.swiftto ensure compilability on current public SDKs. - iCloud Sync: Sync utilizes iCloud Drive ubiquity containers (
NSFileCoordinatorandNSMetadataQuery). The app does not utilize CloudKit databases. - Pro Tier Document Limit: Document uploads are restricted to a hard quota of 1,000 documents under the Pro tier. Unlimited uploads are restricted to the Lifetime tier.
- Evidence Thread Synchronization: Thread history JSON arrays are stored under
Application Support/EvidenceThreads/<containerId>/and are synchronized bidirectionally across devices viaWorkspaceSyncServicein iCloud Drive, gated by tier-specific limits (5 Free / 20 Pro / Unlimited Lifetime).
- macOS Tahoe (26.x) with Xcode 26+
- iOS 26.0+ SDK target support
- Apple Silicon (M1+ / A17 Pro+) for adequate Neural Engine throughput
- Clear macOS extended attributes to prevent codesign failure:
/usr/bin/xattr -cr /Users/gunnarhostetler/Documents/GitHub/OpenIntelligence
- Compile the simulator smoke target:
./scripts/build_simulator_smoke.sh
- Execute the local RAG pipeline validation harness:
python3 scripts/run_rag_benchmarks.py
OpenIntelligence is open-source software. See LICENSE for details.