Benchmark: Fix metrics high variability from run-to-run

### 🐛 Describe the bug

## Issue
The metrics are frequently exceeding the 10% threshold from run-to-run, making it hard to interpret the metrics and decided actions.

It's happening to pretty much ALL models and ALL configs:

![Image](https://github.com/user-attachments/assets/3010f0c4-d0dc-491e-867a-3c6ac3aa1ad6)
![Image](https://github.com/user-attachments/assets/d8798b5c-55ca-4ef9-9455-0b2608145924)
![Image](https://github.com/user-attachments/assets/0d073783-df89-43f6-a0d6-abb64f471ae9)
 link to the [dashboard](https://hud.pytorch.org/benchmark/llms?startTime=Thu%2C%2006%20Feb%202025%2020%3A51%3A56%20GMT&stopTime=Thu%2C%2020%20Feb%202025%2020%3A51%3A56%20GMT&granularity=hour&lBranch=main&lCommit=b1d76c956564cb5f3561219bdbc03f1c9eb8f9ed&rBranch=main&rCommit=3e188fe119a7fd94197afe722d7135a56308954d&repoName=pytorch%2Fexecutorch&modelName=meta-llama%2FLlama-3.2-1B&backendName=llama3_fb16&dtypeName=&deviceName=Apple%20iPhone%2015%20(iOS%2018.0)&archName=All%20Platforms)

## What areas should we look into?
  - iOS benchmark app
  - Android benchmark app

## Solution Space

I think we can start with parameterizing the number of iterations for each model and its benchmark config, find the "right" value so that the metrics (load, inference, tps) fluctuation from run-to-run is within a reasonable range (<10%). 

### Versions

trunk

cc @mergennachin @kimishpatel @iseeyuan @kirklandsign @cbilgin @huydhn @shoumikhin

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Benchmark: Fix metrics high variability from run-to-run #8603

🐛 Describe the bug

Issue

What areas should we look into?

Solution Space

Versions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Benchmark: Fix metrics high variability from run-to-run #8603

Description

🐛 Describe the bug

Issue

What areas should we look into?

Solution Space

Versions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions