Skip to content

Improve OCR quality: dark themes, resolution, extraction logic #3

@Korkyzer

Description

@Korkyzer

Improve OCR text extraction quality (dark UIs, resolution, extraction logic)

Problem

The current OCR pipeline misses most text on dark-themed applications (WhatsApp, Slack, Discord, etc.). On a WhatsApp conversation with dozens of visible messages, agent-watch only captured ~80 characters (menu bar text: "File Edit Chat Call View Window Help").

Root causes identified

  1. NativeTextExtractor short-circuits OCR: The accessibility extractor runs first. If it returns ≥ minimumAccessibilityChars (even just menu bar items), OCR is skipped entirely. For WhatsApp, accessibility returns ~100 chars of sidebar/menu text, satisfying the minimum — so the Vision framework OCR never runs on the actual message content.

  2. Frame buffer resolution too low: FrameBufferStore downscales captures to maxDimension = 1280, which halves Retina resolution (2560 → 1280). Text becomes too small for reliable OCR, especially in dense UIs.

  3. Apple Vision framework struggles with dark themes: VNRecognizeTextRequest performs poorly on light-on-dark text. The Vision framework was designed primarily for document scanning (dark text on light backgrounds).

Changes

1. NativeTextExtractor.swift — Always run both extractors, keep the best

Before: Accessibility runs first; if it returns enough chars, OCR is skipped.
After: Both accessibility AND OCR always run. The result with more text wins.

// Before
if let accessibilityText = accessibilityExtractor.extractText(),
   accessibilityText.count >= minimumAccessibilityChars {
    return ExtractedText(text: accessibilityText, source: .accessibility, metadata: metadata)
}
// OCR only runs as fallback

// After
let accessibilityText = accessibilityExtractor.extractText()
var ocrText: String? = nil
if ocrEnabled {
    ocrText = try ocrExtractor.extractText()
}
// Return whichever extracted more text
if ocrLen > accLen { return ocr } else { return accessibility }

2. FrameBufferStore.swift — Increase resolution to full Retina

// Before
maxDimension: Int = 1280

// After
maxDimension: Int = 2560

Disk impact: frames go from ~250KB to ~400-800KB. With the existing retention/pruning policy this remains well under control.

3. OCRTextExtractor.swift — Color inversion for dark themes

Runs OCR twice: once on the original image, once on a color-inverted version (using CoreImage CIColorInvert). Keeps whichever result contains more text. Also lowered minimumTextHeight from 0.005 to 0.002 to catch smaller text.

Results

Metric Before After
WhatsApp text captured ~80 chars (menu bar only) 1567 chars (all messages, contacts, timestamps, links)
Frame resolution 1280×831 2560×1662
text_source for WhatsApp accessibility (short-circuited) ocr (full Vision + inversion)

Environment

  • macOS 15 (Tahoe)
  • MacBook Pro M-series (Retina display)
  • WhatsApp desktop, Slack, Discord (dark theme)

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions