Skip to content

QVAC-19797 test[api]: ocr-ggml Metal GPU perf coverage + easyocr gallocr memory fix#2483

Open
olyasir wants to merge 3 commits into
mainfrom
ocr-ggml-gpu-test-coverage
Open

QVAC-19797 test[api]: ocr-ggml Metal GPU perf coverage + easyocr gallocr memory fix#2483
olyasir wants to merge 3 commits into
mainfrom
ocr-ggml-gpu-test-coverage

Conversation

@olyasir

@olyasir olyasir commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds Metal GPU performance coverage to the ocr-ggml CI perf table plus an EasyOCR backend-memory fix. Rebased onto current main (after #2457 landed the Adreno guard + Android-Vulkan, and alongside in-flight #2458 for the CPU-vs-Vulkan benchmark).

Commits

  • fix[api]: reuse ggml_gallocr across input sizes (easyocr detection + recognition steps) — keep the gallocr/backing buffer alive and resize in place across region widths instead of free+recreate per size. Cuts the backend-heap churn that fragments the Metal heap and triggers OOM command-buffer failures on memory-constrained devices.
  • test[api]: select Metal on Apple desktopgetBackendDevice() returns metal on darwin (was vulkan|cpu only), so the macOS leg records Metal [GPU] rows. Merged with main's Android-Vulkan selection; the OCR_GGML_BACKEND override now also accepts metal.
  • test[api]: run DocTR on CPU under Metal — the DocTR recognizer's per-region ggml compute is non-deterministically unstable on the constrained macos-15-xlarge runner (aborts status -1 / silent detection collapse). DocTR forces CPU on Metal; EasyOCR (stable on Metal) keeps its Metal GPU pass. This is a deliberate scope, not a fix — see follow-up below.

Backend coverage in the combined perf table

Backend Coverage
CPU all platforms
Vulkan [GPU] Linux (ubuntu-24.04) + Windows (EasyOCR + DocTR); Android Mali via #2458
Metal [GPU] macOS — EasyOCR (DocTR CPU-only there by design)

Follow-up (separate work)

  • DocTR-on-Metal stability — the real fix is an open GPU-memory-pressure investigation (couldn't reproduce on the 192 GB M3 Ultra; next step is repro on the constrained mac-mini-1 M4 with GPU-allocation tracking). Tracked separately so this PR isn't blocked on it.
  • Mobile GPU beyond Android-Vulkan and the Adreno guard already on main.

@olyasir olyasir requested review from a team as code owners June 8, 2026 12:38
olyasir added 3 commits June 8, 2026 16:15
The EasyOCR detection (CRAFT) and recognition (CRNN) steps rebuild their
ggml graph whenever the input size changes — detection on each new image
size, recognition on each distinct text-region width. Previously each
rebuild freed the ggml_gallocr and allocated a brand-new, differently-sized
backend (Metal/Vulkan) buffer. That repeated alloc/free of varying-size GPU
buffers churns and fragments the device heap, which can surface as
out-of-memory command-buffer failures on memory-constrained devices (phones,
small CI runners) even though steady-state process footprint stays flat.

Keep a single gallocr per step and let ggml_gallocr_alloc_graph resize its
backing buffer in place across sizes (the canonical llama.cpp pattern). Only
the size-specific ggml_context is freed and rebuilt; the gallocr is freed
just once, in the destructor / on alloc failure. Output is unchanged —
verified identical region counts on Metal (M3 Ultra). DocTR is fixed-size
(graphs allocated once) so it is unaffected.
…gration suite

getBackendDevice() previously resolved only 'vulkan' or 'cpu', so the macOS
matrix leg always ran CPU-only and the cross-platform perf table never carried
Metal GPU numbers (only the Linux/Windows Vulkan runners recorded [GPU] rows).

ggml's Metal backend is compiled into the addon on darwin — there is no
loadable ggml-vulkan lib to probe — so resolve 'metal' directly on Apple
desktop. The addon's backend selection falls back to CPU when no Metal device
is present, and the suite's GPU detection is backend-agnostic
(backendDevice === 'GPU'/'IGPU', stats.backendIsGpu === 1), so Metal passes are
tagged [GPU] and compared against a forced-CPU pass exactly like Vulkan.

The OCR_GGML_BACKEND override now also accepts 'metal' so a leg can force it.
… CI)

The DocTR recognizer runs ggml_backend_graph_compute once per detected region,
and on the constrained macos-15-xlarge CI runner the Metal backend is
non-deterministically unstable under that sustained load: it either aborts the
whole suite ("[DoctrRecognitionGGML] ggml backend graph compute failed with
status -1", exit 134) or silently collapses detection to garbage. The failure
is not size-bound — the same image's BMP pass passes while its JPEG/PNG passes
fail, and clinical_chemistry (previously stable) failed too — so there is no
stable per-test subset to scope around.

Force CPU for ALL DocTR comparison passes when the auto-selected backend is
Metal (in runDoctrComparison), plus the doctr-models batch-equivalence test
that calls runDoctrOCR directly. Vulkan keeps its DocTR [GPU] pass, and EasyOCR
(runOcrComparison, a different recognizer path that is stable on Metal — the
dense canvasSize page passes) keeps its Metal [GPU] pass. macOS therefore
records Metal numbers for EasyOCR while DocTR stays CPU-only there, and the
suite no longer aborts.

Real Metal-on-Apple DocTR stability remains a separate GPU-memory-pressure
investigation (repro on the constrained mac-mini-1 M4).
@olyasir olyasir force-pushed the ocr-ggml-gpu-test-coverage branch from acd92d5 to 364067b Compare June 8, 2026 13:18
@olyasir olyasir changed the title QVAC-19797 test[api]: ocr-ggml GPU (Vulkan + Metal) perf coverage + Adreno/gallocr backend fixes QVAC-19797 test[api]: ocr-ggml Metal GPU perf coverage + easyocr gallocr memory fix Jun 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant