Skip to content

Gunnarguy/OpenIntelligence

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

543 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

OpenIntelligence

Documentation status: Verified for OpenIntelligence v4.4 on June 28, 2026. Scope: Describes shipped behavior for on-device Apple Intelligence RAG architecture.

OpenIntelligence app icon

Local-first document intelligence for macOS and iOS, featuring an entirely on-device Retrieval-Augmented Generation (RAG) pipeline and native Apple Foundation Models integration.

Download OpenIntelligence on the App Store Read the OpenIntelligence demo guide Read the OpenIntelligence architecture guide View the OpenIntelligence public roadmap

OpenIntelligence is an exploratory, privacy-obsessed document query assistant built natively for Apple platforms. It demonstrates that production-grade document ingestion, vector indexing, lexical retrieval, and generative AI can run entirely on device without sacrificing privacy or relying on third-party cloud wrappers.


๐Ÿ“š Rigorous Engineering Documentation

OpenIntelligence is backed by extensive, rigorous engineering documentation detailing how reliable, hallucination-resistant on-device RAG is achieved using Apple's 4K-token local context windows.

Core Architecture & Systems

Apple Intelligence Engineering Specs

  • Apple Foundation Models Specs: Optimization guide for macOS/iOS 26.x/27, managing 4K token budgets, guided generation via @Generable, and SystemLanguageModel sessions.
  • Apple Document Intelligence: Practical integration with Vision OCR, SFSpeechRecognizer, PDFKit, and CoreText for semantic document parsing.
  • Private Cloud Compute (PCC): Analysis of Apple's PCC enclave constraints, secure remote processing, and native execution routing layers.

Audits & Constraints

  • Hard Limits: A centralized reference for token boundaries, model caps, memory limitations, and platform bottlenecks.
  • Current State & Gaps: Analysis of local inference latency, context packing, and model capability gaps.
  • Evaluation Framework: Detailed verification procedures using scripts/run_rag_benchmarks.py to assert extraction accuracy and similarity scores.

โš™๏ธ Technical Architecture Overview

The runtime operates in two decoupled phases:

flowchart TD
  subgraph INGEST["Import-Time Pipeline"]
    A1["Import Files"]
    A2["Extract & Normalize (Vision OCR)"]
    A3["Semantic Chunking"]
    A4["Build FTS5 & BNNS Vector Indexes"]
    A1 --> A2 --> A3 --> A4
  end

  subgraph QUERY["Query-Time Pipeline"]
    B1["User Query"]
    B2["Analyze Intent & HyDE Expansion"]
    B3["Hybrid Retrieval & RRF Merge"]
    B4["Cross-Encoder Reranking"]
    B5["Verification Gates"]
    B6["Generative LLM Response"]
    B1 --> B2 --> B3 --> B4 --> B5 --> B6
  end

  A4 --> B3
Loading

๐Ÿง  Quality Modes & Inference Routing

The entire RAG architecture operates on a strict 29-Step Pipeline (6 Ingestion steps + 23 Query Loop steps). To handle complex queries, the query loop routes dynamically across three agentic modes and foundation models:

3 Agentic Quality Modes

  • Standard: Executes the 23-step query loop sequentially for maximum speed and battery life.
  • Deep Think: Actively loops the retrieval agent through 4-10 concurrent reasoning sessions until it hits 98% confidence (scales dynamically based on device thermal state).
  • Maximum: Removes the 8-session ceiling, granting the orchestrator an unlimited budget to recursively hunt down answers up to 50 loops.

3 Foundation Model Routes

  • 3B Core: Offline Apple Silicon model (SystemLanguageModel.default) executing standard query tasks.
  • 20B Advanced: Offline Apple Silicon model leveraging unified memory and NAND Flash Paging for enhanced reasoning.
  • Private Cloud Compute (PT-MoE): Escalates over encrypted channels to Apple's 32K context secure server enclaves. Integrates native FoundationModels.PrivateCloudComputeLanguageModel execution when running on iOS 27 / macOS 27+, falling back cleanly to local SystemLanguageModel simulation on older OS versions.

๐Ÿ—บ๏ธ Codebase Map

Module Core Files Responsibility
Ingestion DocumentProcessor.swift, LayoutAwareExtractor.swift Document content extraction, Vision OCR fallback, semantic structure recovery.
Chunking SemanticChunker.swift, ContentTaggingService.swift Context-aware document chunking, entity resolution, NLP metadata enrichment.
Indexing SQLiteFullTextService.swift, BNNSVectorDatabase.swift SQLite FTS5 lexical storage and local BNNS-accelerated vector indexing.
Retrieval HybridSearchService.swift, ContextPackingService.swift BM25 + Vector hybrid merging, parent-chunk reconstruction, exact token packing.
Orchestration LLMService.swift, RAGService.swift Execution coordination with the local SystemLanguageModel and evaluation loops.
Evidence Threads EvidenceThread.swift, EvidenceThreadStore.swift Thread-safe local persistence of conversational research queries and verification results.
Diagnostics EvidenceThreadDebugService.swift, EvidenceThreadDebugView.swift Developer-only view and helper service to test local persistent store integrity.
Shortcuts RAGAppIntents.swift, ScreenAwarenessIntents.swift, VisualIntelligenceIntents.swift Siri voice integration and entity-native App Intents (16 active actions) resolving in-process via presented activeInstance binding.

๐Ÿ› ๏ธ Placeholders & Scaffolding Warnings

To maintain codebase transparency, please note:

  • Core AI Integration: Fully integrated and registered via CoreAISentenceEmbeddingProvider.swift. Runs zero-copy Silicon-native sentence embeddings on iOS 27+ / macOS 27+ compatible devices, automatically falling back to the standard CoreMLSentenceEmbeddingProvider on older targets.
  • Private Cloud Compute (PCC): Routed locally using a fallback system language model wrapper in EngineSDKCompatibility.swift to ensure compilability on current public SDKs.
  • iCloud Sync: Sync utilizes iCloud Drive ubiquity containers (NSFileCoordinator and NSMetadataQuery). The app does not utilize CloudKit databases.
  • Pro Tier Document Limit: Document uploads are restricted to a hard quota of 1,000 documents under the Pro tier. Unlimited uploads are restricted to the Lifetime tier.
  • Evidence Thread Synchronization: Thread history JSON arrays are stored under Application Support/EvidenceThreads/<containerId>/ and are synchronized bidirectionally across devices via WorkspaceSyncService in iCloud Drive, gated by tier-specific limits (5 Free / 20 Pro / Unlimited Lifetime).

๐Ÿš€ Build & Verification

Requirements

  • macOS Tahoe (26.x) with Xcode 26+
  • iOS 26.0+ SDK target support
  • Apple Silicon (M1+ / A17 Pro+) for adequate Neural Engine throughput

Instructions

  1. Clear macOS extended attributes to prevent codesign failure:
    /usr/bin/xattr -cr /Users/gunnarhostetler/Documents/GitHub/OpenIntelligence
  2. Compile the simulator smoke target:
    ./scripts/build_simulator_smoke.sh
  3. Execute the local RAG pipeline validation harness:
    python3 scripts/run_rag_benchmarks.py

License

OpenIntelligence is open-source software. See LICENSE for details.

About

Apple-native iOS/macOS app for document intelligence, OCR, cited answers, and source-backed retrieval over PDFs, scans, and user-controlled files.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors