Skip to content

[RFC] Standardized Model Config for Benchmarks #1051

@Tcc0403

Description

@Tcc0403

🚀 The feature, motivation and pitch

As we add support for more devices, benchmarking scripts increasingly rely on device-specific tensor shapes due to VRAM constraints. This leads to fragmented benchmark setups and scattered performance results that are difficult to compare or reproduce. To make benchmarking more scalable and consistent, I propose introducing a standardized benchmark model configuration.

Motivation

Today, different kernels (and sometimes different devices) use ad-hoc shapes, which:

  • Require device-specific dispatching in benchmark scripts
  • Produce results that are hard to compare across hardware
  • Increase maintenance burden for contributors adding new benchmarks

Proposal

Define one or more representative “mainstream” model profiles (e.g., LLaMA-/GPT-like) with canonical parameters such as: hidden_size, vocab_size, num_q_heads, num_kv_heads, etc.

All benchmark scripts would derive their shapes from this shared config, optionally using scaled-down subsets when needed to fit memory constraints, instead of inventing per-device shapes.

This would:

  • Eliminate ad-hoc device-specific shapes
  • Improve comparability across kernels and hardware
  • Lower the barrier for contributors writing new benchmarks
  • Improve reproducibility of performance results

Target Devices

Before finalizing the standardized config, it would be helpful to clarify which devices we officially want to support for benchmarking.

Here’s the list we currently have:

  • NVIDIA H100 (80GB)
  • Intel XPU GPU Max 1100 (48GB)
  • NPU Atlas 900 A2 POD (64G)

Please feel free to comment if I missed any devices, or if there are additional targets we should consider.

Next Steps

  • Confirm target benchmarking devices
  • Agree on one or more baseline model profiles (e.g., LLaMA, Qwen, Gemma, etc)
  • Define canonical parameters (hidden_size, vocab_size, heads, etc.)
  • Specify scaling rules for memory-constrained devices
  • Refactor existing benchmarks to consume the standardized config

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions