Open
Description
🐛 Describe the bug
Issue
The metrics are frequently exceeding the 10% threshold from run-to-run, making it hard to interpret the metrics and decided actions.
It's happening to pretty much ALL models and ALL configs:
link to the dashboard
What areas should we look into?
- iOS benchmark app
- Android benchmark app
Solution Space
I think we can start with parameterizing the number of iterations for each model and its benchmark config, find the "right" value so that the metrics (load, inference, tps) fluctuation from run-to-run is within a reasonable range (<10%).
Versions
trunk
cc @mergennachin @kimishpatel @iseeyuan @kirklandsign @cbilgin @huydhn @shoumikhin
Metadata
Metadata
Assignees
Labels
Not as big of a feature, but technically not a bug. Should be easy to fixIssues related to Android code, build, and executionIssues related to the benchmark infrastructureItems require an triage reviewThis issue has been looked at a team member, and triaged and prioritized into an appropriate module
Type
Projects
Status
In Progress
Status
Todo