A macOS menu bar app that captures screenshots and converts them to searchable, formatted text using 100% local ML inference. No cloud APIs, no subscriptions, no data leaving your machine.
Built for academics, developers, researchers, and knowledge workers who deal with technical content — code, math, structured documents — and value privacy.
Capture → Process → Store → Search
- Capture a region, window, or full screen via ScreenCaptureKit
- Process with local ML — Apple Vision OCR, IBM Docling, or Google Gemma
- Store automatically in a local SQLite database with full-text search
- Search across all your captures instantly
Hold modifier keys during capture to select a pipeline:
| Mode | Modifier | Pipeline | Best For |
|---|---|---|---|
| Vision | (none) | Apple Vision OCR | Fast narrative text |
| Docling | ⌥ Option |
granite-docling-258M | Tables, forms, structured docs |
| Vision + Gemma | ⇧ Shift |
Vision OCR → Gemma LLM | Enhanced formatting, LaTeX |
| Docling + Gemma | ⌥⇧ Both |
Docling → Gemma LLM | Academic papers, complex LaTeX |
Gemma is available in two sizes (4B and 12B) — selectable in Settings.
Enable in Settings to run all 6 pipelines in parallel on a single capture (Vision, Docling, each with no LLM / Gemma 4B / Gemma 12B) and compare results side-by-side.
- macOS 14.0+ (Sonoma)
- Apple Silicon (M1 or later) — required for MLX inference
- Xcode 15+ with Swift 5.9
- Python 3.12+ (via pyenv)
- ~8 GB disk for ML models
- ~6.5 GB RAM if using comparison mode (both Gemma models loaded)
pyenv install 3.12.10
pyenv shell 3.12.10
pip install mlx-lm mlx-vlm pillow docling-core huggingface-hubOr use the setup script:
./scripts/dev_setup.shcd models/
# Document understanding (~500MB)
huggingface-cli download ibm-granite/granite-docling-258M-mlx \
--local-dir granite-docling-258M-mlx
# Gemma 3 4B — lighter, faster (~2GB)
huggingface-cli download mlx-community/gemma-3-4b-it-4bit \
--local-dir gemma-3-4b-it-4bit
# Gemma 3 12B — higher quality (~4GB)
huggingface-cli download mlx-community/gemma-3-12b-it-4bit \
--local-dir gemma-3-12b-it-4bitOr use the download script:
./scripts/download_model.shxcodebuild -scheme SnapScribe -configuration Debug buildThen open the built app:
open ~/Library/Developer/Xcode/DerivedData/SnapScribe-*/Build/Products/Debug/SnapScribe.appOr open SnapScribe.xcodeproj in Xcode and hit ⌘R.
Note: Debug builds use hardcoded paths to
~/.pyenv/versions/3.12.10/bin/python3.12and the localmodels/directory. SeeDoclingProcessor.swift,GemmaProcessor.swift, andGemmaModel.swift.
┌─────────────────────────────────────────────────────────┐
│ SwiftUI Menu Bar App │
│ │
│ MenuBarContentView ── AppState ── LibraryWindow │
│ (Capture/History) │ (3-column browser) │
│ │ │
│ ProcessorManager │
│ (Singleton) │
└──────────────────────────┬──────────────────────────────┘
│
┌────────────┼────────────────┐
│ │ │
┌─────▼────┐ ┌───▼──────────┐ ┌──▼──────────┐
│ VisionOCR │ │ Docling │ │ Gemma │
│ (native) │ │ (Python IPC) │ │ (Python IPC)│
└─────┬─────┘ └──────────────┘ └─────────────┘
│
┌─────▼──────┐
│CaptureStore │ ← SQLite + FTS5
└────────────┘
| Directory | Contents |
|---|---|
SnapScribe/App/ |
UI views — menu bar, history, library, comparison |
SnapScribe/Capture/ |
ScreenCaptureKit integration, region selection |
SnapScribe/Inference/ |
ML processors — Vision, Docling, Gemma, embeddings |
SnapScribe/Storage/ |
SQLite database, data models, FTS5 search |
SnapScribe/Settings/ |
Settings window |
SnapScribe/Resources/ |
Python inference servers |
models/ |
ML model weights (git-ignored, ~8GB) |
scripts/ |
Dev setup, model download, bundling, signing |
Swift communicates with Python inference servers over stdin/stdout JSON:
Request: {"id": "uuid", "action": "convert|enhance|ping", "image": "base64", ...}
Response: {"id": "uuid", "status": "success|error", "markdown": "...", ...}
Servers emit {"status": "ready"} once the model is loaded into memory. Models stay resident via ProcessorManager — first call takes 5-10s, subsequent calls are fast.
Stored at ~/Library/Application Support/SnapScribe/captures.db (SQLite, auto-migrating schema).
captures— screenshot metadata, OCR text, enhanced text, user edits, notes, tags, thumbnailsfolders— hierarchical organizationcomparison_results— parallel pipeline comparison datacaptures_fts— FTS5 virtual table for full-text search across all text fields
| Script | Purpose |
|---|---|
scripts/dev_setup.sh |
Full dev environment setup (validates Apple Silicon, macOS, Python) |
scripts/download_model.sh |
Download ML models from HuggingFace |
scripts/bundle_python.sh |
Bundle Python runtime for distribution |
scripts/create_dmg.sh |
Create distributable DMG |
scripts/sign_app.sh |
Code signing |
- Debug paths are hardcoded — Python and model paths point to the dev machine. Release builds will need a bundled runtime.
- App Sandbox disabled in Debug to allow Python subprocess access.
- Not App Store distributable — requires screen recording permission (direct distribution only).
- No global keyboard shortcuts yet — KeyboardShortcuts package is integrated but not wired up.
All rights reserved.