QVAC-19797 test[api]: ocr-ggml Metal GPU perf coverage + easyocr gallocr memory fix#2483
Open
olyasir wants to merge 3 commits into
Open
QVAC-19797 test[api]: ocr-ggml Metal GPU perf coverage + easyocr gallocr memory fix#2483olyasir wants to merge 3 commits into
olyasir wants to merge 3 commits into
Conversation
The EasyOCR detection (CRAFT) and recognition (CRNN) steps rebuild their ggml graph whenever the input size changes — detection on each new image size, recognition on each distinct text-region width. Previously each rebuild freed the ggml_gallocr and allocated a brand-new, differently-sized backend (Metal/Vulkan) buffer. That repeated alloc/free of varying-size GPU buffers churns and fragments the device heap, which can surface as out-of-memory command-buffer failures on memory-constrained devices (phones, small CI runners) even though steady-state process footprint stays flat. Keep a single gallocr per step and let ggml_gallocr_alloc_graph resize its backing buffer in place across sizes (the canonical llama.cpp pattern). Only the size-specific ggml_context is freed and rebuilt; the gallocr is freed just once, in the destructor / on alloc failure. Output is unchanged — verified identical region counts on Metal (M3 Ultra). DocTR is fixed-size (graphs allocated once) so it is unaffected.
…gration suite getBackendDevice() previously resolved only 'vulkan' or 'cpu', so the macOS matrix leg always ran CPU-only and the cross-platform perf table never carried Metal GPU numbers (only the Linux/Windows Vulkan runners recorded [GPU] rows). ggml's Metal backend is compiled into the addon on darwin — there is no loadable ggml-vulkan lib to probe — so resolve 'metal' directly on Apple desktop. The addon's backend selection falls back to CPU when no Metal device is present, and the suite's GPU detection is backend-agnostic (backendDevice === 'GPU'/'IGPU', stats.backendIsGpu === 1), so Metal passes are tagged [GPU] and compared against a forced-CPU pass exactly like Vulkan. The OCR_GGML_BACKEND override now also accepts 'metal' so a leg can force it.
… CI)
The DocTR recognizer runs ggml_backend_graph_compute once per detected region,
and on the constrained macos-15-xlarge CI runner the Metal backend is
non-deterministically unstable under that sustained load: it either aborts the
whole suite ("[DoctrRecognitionGGML] ggml backend graph compute failed with
status -1", exit 134) or silently collapses detection to garbage. The failure
is not size-bound — the same image's BMP pass passes while its JPEG/PNG passes
fail, and clinical_chemistry (previously stable) failed too — so there is no
stable per-test subset to scope around.
Force CPU for ALL DocTR comparison passes when the auto-selected backend is
Metal (in runDoctrComparison), plus the doctr-models batch-equivalence test
that calls runDoctrOCR directly. Vulkan keeps its DocTR [GPU] pass, and EasyOCR
(runOcrComparison, a different recognizer path that is stable on Metal — the
dense canvasSize page passes) keeps its Metal [GPU] pass. macOS therefore
records Metal numbers for EasyOCR while DocTR stays CPU-only there, and the
suite no longer aborts.
Real Metal-on-Apple DocTR stability remains a separate GPU-memory-pressure
investigation (repro on the constrained mac-mini-1 M4).
acd92d5 to
364067b
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds Metal GPU performance coverage to the ocr-ggml CI perf table plus an EasyOCR backend-memory fix. Rebased onto current
main(after #2457 landed the Adreno guard + Android-Vulkan, and alongside in-flight #2458 for the CPU-vs-Vulkan benchmark).Commits
ggml_gallocracross input sizes (easyocr detection + recognition steps) — keep the gallocr/backing buffer alive and resize in place across region widths instead of free+recreate per size. Cuts the backend-heap churn that fragments the Metal heap and triggers OOM command-buffer failures on memory-constrained devices.getBackendDevice()returnsmetalon darwin (was vulkan|cpu only), so the macOS leg records Metal[GPU]rows. Merged with main's Android-Vulkan selection; theOCR_GGML_BACKENDoverride now also acceptsmetal.macos-15-xlargerunner (abortsstatus -1/ silent detection collapse). DocTR forces CPU on Metal; EasyOCR (stable on Metal) keeps its Metal GPU pass. This is a deliberate scope, not a fix — see follow-up below.Backend coverage in the combined perf table
[GPU][GPU]Follow-up (separate work)
mac-mini-1M4 with GPU-allocation tracking). Tracked separately so this PR isn't blocked on it.