OpenIntelligence

Documentation status: Verified for OpenIntelligence v4.4 on June 28, 2026. Scope: Describes shipped behavior for on-device Apple Intelligence RAG architecture.

Local-first document intelligence for macOS and iOS, featuring an entirely on-device Retrieval-Augmented Generation (RAG) pipeline and native Apple Foundation Models integration.

OpenIntelligence is an exploratory, privacy-obsessed document query assistant built natively for Apple platforms. It demonstrates that production-grade document ingestion, vector indexing, lexical retrieval, and generative AI can run entirely on device without sacrificing privacy or relying on third-party cloud wrappers.

📚 Rigorous Engineering Documentation

OpenIntelligence is backed by extensive, rigorous engineering documentation detailing how reliable, hallucination-resistant on-device RAG is achieved using Apple's 4K-token local context windows.

Core Architecture & Systems

System Architecture: The high-level view of the decoupled import-time and query-time pipelines.
Retrieval Pipeline (RETRIEVAL_PIPELINE.md): Deep dive into the hybrid search engine (BM25 + Core ML Vector) and Reciprocal Rank Fusion implementation.
Ingestion Pipeline (INGESTION_PIPELINE.md): Details of the semantic chunker, local Vision OCR fallbacks, and NLP metadata extraction.
Privacy & Routing (PRIVACY_AND_ROUTING.md): Strict local-first data guarantees, local cache layouts, and routing protocols.

Apple Intelligence Engineering Specs

Apple Foundation Models Specs: Optimization guide for macOS/iOS 26.x/27, managing 4K token budgets, guided generation via @Generable, and SystemLanguageModel sessions.
Apple Document Intelligence: Practical integration with Vision OCR, SFSpeechRecognizer, PDFKit, and CoreText for semantic document parsing.
Private Cloud Compute (PCC): Analysis of Apple's PCC enclave constraints, secure remote processing, and native execution routing layers.

Audits & Constraints

Hard Limits: A centralized reference for token boundaries, model caps, memory limitations, and platform bottlenecks.
Current State & Gaps: Analysis of local inference latency, context packing, and model capability gaps.
Evaluation Framework: Detailed verification procedures using scripts/run_rag_benchmarks.py to assert extraction accuracy and similarity scores.

⚙️ Technical Architecture Overview

The runtime operates in two decoupled phases:

flowchart TD
  subgraph INGEST["Import-Time Pipeline"]
    A1["Import Files"]
    A2["Extract & Normalize (Vision OCR)"]
    A3["Semantic Chunking"]
    A4["Build FTS5 & BNNS Vector Indexes"]
    A1 --> A2 --> A3 --> A4
  end

  subgraph QUERY["Query-Time Pipeline"]
    B1["User Query"]
    B2["Analyze Intent & HyDE Expansion"]
    B3["Hybrid Retrieval & RRF Merge"]
    B4["Cross-Encoder Reranking"]
    B5["Verification Gates"]
    B6["Generative LLM Response"]
    B1 --> B2 --> B3 --> B4 --> B5 --> B6
  end

  A4 --> B3

🧠 Quality Modes & Inference Routing

The entire RAG architecture operates on a strict 29-Step Pipeline (6 Ingestion steps + 23 Query Loop steps). To handle complex queries, the query loop routes dynamically across three agentic modes and foundation models:

3 Agentic Quality Modes

Standard: Executes the 23-step query loop sequentially for maximum speed and battery life.
Deep Think: Actively loops the retrieval agent through 4-10 concurrent reasoning sessions until it hits 98% confidence (scales dynamically based on device thermal state).
Maximum: Removes the 8-session ceiling, granting the orchestrator an unlimited budget to recursively hunt down answers up to 50 loops.

3 Foundation Model Routes

3B Core: Offline Apple Silicon model (SystemLanguageModel.default) executing standard query tasks.
20B Advanced: Offline Apple Silicon model leveraging unified memory and NAND Flash Paging for enhanced reasoning.
Private Cloud Compute (PT-MoE): Escalates over encrypted channels to Apple's 32K context secure server enclaves. Integrates native FoundationModels.PrivateCloudComputeLanguageModel execution when running on iOS 27 / macOS 27+, falling back cleanly to local SystemLanguageModel simulation on older OS versions.

🗺️ Codebase Map

Module	Core Files	Responsibility
Ingestion	`DocumentProcessor.swift`, `LayoutAwareExtractor.swift`	Document content extraction, Vision OCR fallback, semantic structure recovery.
Chunking	`SemanticChunker.swift`, `ContentTaggingService.swift`	Context-aware document chunking, entity resolution, NLP metadata enrichment.
Indexing	`SQLiteFullTextService.swift`, `BNNSVectorDatabase.swift`	SQLite FTS5 lexical storage and local BNNS-accelerated vector indexing.
Retrieval	`HybridSearchService.swift`, `ContextPackingService.swift`	BM25 + Vector hybrid merging, parent-chunk reconstruction, exact token packing.
Orchestration	`LLMService.swift`, `RAGService.swift`	Execution coordination with the local `SystemLanguageModel` and evaluation loops.
Evidence Threads	`EvidenceThread.swift`, `EvidenceThreadStore.swift`	Thread-safe local persistence of conversational research queries and verification results.
Diagnostics	`EvidenceThreadDebugService.swift`, `EvidenceThreadDebugView.swift`	Developer-only view and helper service to test local persistent store integrity.
Shortcuts	`RAGAppIntents.swift`, `ScreenAwarenessIntents.swift`, `VisualIntelligenceIntents.swift`	Siri voice integration and entity-native App Intents (16 active actions) resolving in-process via presented activeInstance binding.

🛠️ Placeholders & Scaffolding Warnings

To maintain codebase transparency, please note:

Core AI Integration: Fully integrated and registered via CoreAISentenceEmbeddingProvider.swift. Runs zero-copy Silicon-native sentence embeddings on iOS 27+ / macOS 27+ compatible devices, automatically falling back to the standard CoreMLSentenceEmbeddingProvider on older targets.
Private Cloud Compute (PCC): Routed locally using a fallback system language model wrapper in EngineSDKCompatibility.swift to ensure compilability on current public SDKs.
iCloud Sync: Sync utilizes iCloud Drive ubiquity containers (NSFileCoordinator and NSMetadataQuery). The app does not utilize CloudKit databases.
Pro Tier Document Limit: Document uploads are restricted to a hard quota of 1,000 documents under the Pro tier. Unlimited uploads are restricted to the Lifetime tier.
Evidence Thread Synchronization: Thread history JSON arrays are stored under Application Support/EvidenceThreads/<containerId>/ and are synchronized bidirectionally across devices via WorkspaceSyncService in iCloud Drive, gated by tier-specific limits (5 Free / 20 Pro / Unlimited Lifetime).

🚀 Build & Verification

Requirements

macOS Tahoe (26.x) with Xcode 26+
iOS 26.0+ SDK target support
Apple Silicon (M1+ / A17 Pro+) for adequate Neural Engine throughput

Instructions

Clear macOS extended attributes to prevent codesign failure:

/usr/bin/xattr -cr /Users/gunnarhostetler/Documents/GitHub/OpenIntelligence

Compile the simulator smoke target:
```
./scripts/build_simulator_smoke.sh
```
Execute the local RAG pipeline validation harness:
```
python3 scripts/run_rag_benchmarks.py
```

License

OpenIntelligence is open-source software. See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 543 Commits
.agents		.agents
.github		.github
Benchmarks		Benchmarks
Docs		Docs
OpenIntelligence.xcodeproj		OpenIntelligence.xcodeproj
OpenIntelligence		OpenIntelligence
OpenIntelligenceLiveActivities		OpenIntelligenceLiveActivities
OpenIntelligenceTests		OpenIntelligenceTests
Xrays/pipeline-xray		Xrays/pipeline-xray
ci_scripts		ci_scripts
fastlane		fastlane
scripts		scripts
.geminirules		.geminirules
.gitattributes		.gitattributes
.gitignore		.gitignore
.swift-format		.swift-format
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
DeepThinkConsole&Trace.txt		DeepThinkConsole&Trace.txt
FinalizationOpenIntelligence.md		FinalizationOpenIntelligence.md
FinalizationOpenIntelligence1b.md		FinalizationOpenIntelligence1b.md
GEMINI.md		GEMINI.md
Gemfile		Gemfile
Gemfile.lock		Gemfile.lock
HOW_IT_WORKS.md		HOW_IT_WORKS.md
Info.plist		Info.plist
LICENSE		LICENSE
PRIVACY.md		PRIVACY.md
Package.swift		Package.swift
README.md		README.md
THIRD_PARTY_NOTICES.md		THIRD_PARTY_NOTICES.md
THis.md		THis.md
WHATS_NEW.md		WHATS_NEW.md
youtube_videos.json		youtube_videos.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OpenIntelligence

📚 Rigorous Engineering Documentation

Core Architecture & Systems

Apple Intelligence Engineering Specs

Audits & Constraints

⚙️ Technical Architecture Overview

🧠 Quality Modes & Inference Routing

3 Agentic Quality Modes

3 Foundation Model Routes

🗺️ Codebase Map

🛠️ Placeholders & Scaffolding Warnings

🚀 Build & Verification

Requirements

Instructions

License

About

Uh oh!

Releases 5

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

OpenIntelligence

📚 Rigorous Engineering Documentation

Core Architecture & Systems

Apple Intelligence Engineering Specs

Audits & Constraints

⚙️ Technical Architecture Overview

🧠 Quality Modes & Inference Routing

3 Agentic Quality Modes

3 Foundation Model Routes

🗺️ Codebase Map

🛠️ Placeholders & Scaffolding Warnings

🚀 Build & Verification

Requirements

Instructions

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 5

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages