Skip to content

Benchmark: Fix metrics high variability from run-to-run #8603

Open
@guangy10

Description

@guangy10

🐛 Describe the bug

Issue

The metrics are frequently exceeding the 10% threshold from run-to-run, making it hard to interpret the metrics and decided actions.

It's happening to pretty much ALL models and ALL configs:

Image
Image
Image
link to the dashboard

What areas should we look into?

  • iOS benchmark app
  • Android benchmark app

Solution Space

I think we can start with parameterizing the number of iterations for each model and its benchmark config, find the "right" value so that the metrics (load, inference, tps) fluctuation from run-to-run is within a reasonable range (<10%).

Versions

trunk

cc @mergennachin @kimishpatel @iseeyuan @kirklandsign @cbilgin @huydhn @shoumikhin

Metadata

Metadata

Labels

enhancementNot as big of a feature, but technically not a bug. Should be easy to fixhigh prioritymodule: androidIssues related to Android code, build, and executionmodule: benchmarkIssues related to the benchmark infrastructuretriage reviewItems require an triage reviewtriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

Type

No type

Projects

Status

In Progress

Status

Todo

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions