Skip to content

Y7: GPU Performance Benchmarks (APR decode ≥200 tok/s) #141

@noahgift

Description

@noahgift

Overview

Implement GPU performance benchmarks for APR format per Section Y.2 of the spec.

Requirement

  • Y7: APR decode speed must be ≥200 tok/s on GPU (RTX 4090 reference)
  • Must match or exceed GGUF decode speed on same hardware

Falsification Condition

APR < 200 tok/s when GGUF ≥ 200 tok/s on same GPU

Implementation Tasks

  • Add CUDA feature flag to realizar
  • Implement GPU kernels for APR inference
  • Create benchmark harness for GPU performance
  • Verify parity with GGUF on RTX 4090 or equivalent
  • Add to CI with GPU runner (optional)

Blocked By

  • Requires GPU hardware for development and testing

References

  • Spec: docs/specifications/apr-whisper-and-cookbook-support-eoy-2025.md Section Y.2
  • Related: Y6 (CPU benchmarks) - ✅ Verified at 206.4 tok/s

Priority

P2 - Deferred (no GPU hardware available currently)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions