Skip to content

Resolve the discrepancy of latency report between LLMs and non-LLMs #8576

@guangy10

Description

@guangy10

🐛 Describe the bug

Image

As shown on the dashboard, the avg_inference_latency (ms) is skipped for LLM, and report only generate_time (ms) instead.

Upon checking the iOS run for example, a LLM job will run three tests on-device to report different metrics:

  1. test_load_llama_3_2_1b_llama3_fb16_pte_iOS_17_2_1_iPhone15_4
  2. test_forward_llama_3_2_1b_llama3_fb16_pte_iOS_17_2_1_iPhone15_4
  3. test_generate_llama_3_2_1b_llama3_fb16_pte_tokenizer_model_iOS_17_2_1_iPhone15_4
    While a non-LLM job will only run the first two tests (test_load_ and test_forward_ ) instead.

See detailed jobs here:

Three things to get clarification in this task:

  1. Because test_forward_* is reported to both LLM and non-LLM, why isn't reported to the dash?
  2. Let's annotate each metrics in the DB so users will know what exactly is measured by each.
    3. Confirm if Android is measuring and reporting exact same metrics Report avg_inference_latency from Android LLM benchmark app #8578

Versions

trunk

cc @huydhn @kirklandsign @shoumikhin @mergennachin @byjlw

Metadata

Metadata

Labels

enhancementNot as big of a feature, but technically not a bug. Should be easy to fixmodule: benchmarkIssues related to the benchmark infrastructuremodule: user experienceIssues related to reducing friction for userstriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

Status

To triage

Status

In Progress

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions