Skip to content

deepc0py/SnapScribe

Repository files navigation

SnapScribe

A macOS menu bar app that captures screenshots and converts them to searchable, formatted text using 100% local ML inference. No cloud APIs, no subscriptions, no data leaving your machine.

Built for academics, developers, researchers, and knowledge workers who deal with technical content — code, math, structured documents — and value privacy.

How It Works

Capture → Process → Store → Search
  1. Capture a region, window, or full screen via ScreenCaptureKit
  2. Process with local ML — Apple Vision OCR, IBM Docling, or Google Gemma
  3. Store automatically in a local SQLite database with full-text search
  4. Search across all your captures instantly

Processing Modes

Hold modifier keys during capture to select a pipeline:

Mode Modifier Pipeline Best For
Vision (none) Apple Vision OCR Fast narrative text
Docling ⌥ Option granite-docling-258M Tables, forms, structured docs
Vision + Gemma ⇧ Shift Vision OCR → Gemma LLM Enhanced formatting, LaTeX
Docling + Gemma ⌥⇧ Both Docling → Gemma LLM Academic papers, complex LaTeX

Gemma is available in two sizes (4B and 12B) — selectable in Settings.

Comparison Mode

Enable in Settings to run all 6 pipelines in parallel on a single capture (Vision, Docling, each with no LLM / Gemma 4B / Gemma 12B) and compare results side-by-side.

Requirements

  • macOS 14.0+ (Sonoma)
  • Apple Silicon (M1 or later) — required for MLX inference
  • Xcode 15+ with Swift 5.9
  • Python 3.12+ (via pyenv)
  • ~8 GB disk for ML models
  • ~6.5 GB RAM if using comparison mode (both Gemma models loaded)

Setup

1. Python Environment

pyenv install 3.12.10
pyenv shell 3.12.10
pip install mlx-lm mlx-vlm pillow docling-core huggingface-hub

Or use the setup script:

./scripts/dev_setup.sh

2. Download Models

cd models/

# Document understanding (~500MB)
huggingface-cli download ibm-granite/granite-docling-258M-mlx \
  --local-dir granite-docling-258M-mlx

# Gemma 3 4B — lighter, faster (~2GB)
huggingface-cli download mlx-community/gemma-3-4b-it-4bit \
  --local-dir gemma-3-4b-it-4bit

# Gemma 3 12B — higher quality (~4GB)
huggingface-cli download mlx-community/gemma-3-12b-it-4bit \
  --local-dir gemma-3-12b-it-4bit

Or use the download script:

./scripts/download_model.sh

3. Build & Run

xcodebuild -scheme SnapScribe -configuration Debug build

Then open the built app:

open ~/Library/Developer/Xcode/DerivedData/SnapScribe-*/Build/Products/Debug/SnapScribe.app

Or open SnapScribe.xcodeproj in Xcode and hit ⌘R.

Note: Debug builds use hardcoded paths to ~/.pyenv/versions/3.12.10/bin/python3.12 and the local models/ directory. See DoclingProcessor.swift, GemmaProcessor.swift, and GemmaModel.swift.

Architecture

┌─────────────────────────────────────────────────────────┐
│                  SwiftUI Menu Bar App                    │
│                                                         │
│   MenuBarContentView ── AppState ── LibraryWindow       │
│        (Capture/History)     │      (3-column browser)  │
│                              │                          │
│                     ProcessorManager                    │
│                      (Singleton)                        │
└──────────────────────────┬──────────────────────────────┘
                           │
              ┌────────────┼────────────────┐
              │            │                │
        ┌─────▼────┐  ┌───▼──────────┐  ┌──▼──────────┐
        │ VisionOCR │  │ Docling      │  │ Gemma       │
        │ (native)  │  │ (Python IPC) │  │ (Python IPC)│
        └─────┬─────┘  └──────────────┘  └─────────────┘
              │
        ┌─────▼──────┐
        │CaptureStore │ ← SQLite + FTS5
        └────────────┘

Key Directories

Directory Contents
SnapScribe/App/ UI views — menu bar, history, library, comparison
SnapScribe/Capture/ ScreenCaptureKit integration, region selection
SnapScribe/Inference/ ML processors — Vision, Docling, Gemma, embeddings
SnapScribe/Storage/ SQLite database, data models, FTS5 search
SnapScribe/Settings/ Settings window
SnapScribe/Resources/ Python inference servers
models/ ML model weights (git-ignored, ~8GB)
scripts/ Dev setup, model download, bundling, signing

Python IPC

Swift communicates with Python inference servers over stdin/stdout JSON:

Request:  {"id": "uuid", "action": "convert|enhance|ping", "image": "base64", ...}
Response: {"id": "uuid", "status": "success|error", "markdown": "...", ...}

Servers emit {"status": "ready"} once the model is loaded into memory. Models stay resident via ProcessorManager — first call takes 5-10s, subsequent calls are fast.

Database

Stored at ~/Library/Application Support/SnapScribe/captures.db (SQLite, auto-migrating schema).

  • captures — screenshot metadata, OCR text, enhanced text, user edits, notes, tags, thumbnails
  • folders — hierarchical organization
  • comparison_results — parallel pipeline comparison data
  • captures_fts — FTS5 virtual table for full-text search across all text fields

Scripts

Script Purpose
scripts/dev_setup.sh Full dev environment setup (validates Apple Silicon, macOS, Python)
scripts/download_model.sh Download ML models from HuggingFace
scripts/bundle_python.sh Bundle Python runtime for distribution
scripts/create_dmg.sh Create distributable DMG
scripts/sign_app.sh Code signing

Known Limitations

  • Debug paths are hardcoded — Python and model paths point to the dev machine. Release builds will need a bundled runtime.
  • App Sandbox disabled in Debug to allow Python subprocess access.
  • Not App Store distributable — requires screen recording permission (direct distribution only).
  • No global keyboard shortcuts yet — KeyboardShortcuts package is integrated but not wired up.

License

All rights reserved.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published