Add workflow for on-demand benchmarking #4441

guangy10 · 2024-07-27T00:32:19Z

Ability to schedule an on-demand benchmark job from GA UI with params, e.g. models, delegates, devices, etc
Ability to schedule from PR via tagging (doubt it could work with non-default args)

pytorch-bot · 2024-07-27T00:32:22Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/4441

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 0f670eb with merge base 227b49d ():

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

trunk / test-llama-runner-mac (fp32, cmake, portable) / macos-job (gh) (trunk failure)
Process completed with exit code 1.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

guangy10 · 2024-07-27T00:49:48Z

@huydhn how can I test this workflow?

The GA UI doesn't allow me to dispatch a workflow: https://github.com/pytorch/executorch/actions/workflows/android-perf.yml
Temporarily enabling the run on PR, however, I don't see it's taking the default models "stories110M" to run the "export-models" job

.github/pytorch-probot.yml

.github/workflows/android-perf.yml

guangy10 · 2024-07-29T19:16:47Z

@huydhn Do I need to get the PR merged in order to test it from the GitHub Action UI? In the WIP PR, I can find this workflow but it seems no way to trigger a run for it

huydhn · 2024-07-29T20:02:39Z

@huydhn Do I need to get the PR merged in order to test it from the GitHub Action UI? In the WIP PR, I can find this workflow but it seems no way to trigger a run for it

For testing, I will just add pull_request into the list of workflow's triggers, then remove it right before committing. Once the PR lands, subsequent ones will be able to use the ciflow tag it creates.

huydhn · 2024-07-29T21:43:31Z

I just realize that triggering the workflow using pull_request doesn't count as a workflow dispatch, so it couldn't be tested that way. Let me tweak the default values a bit to run the default model when the input is not set

facebook-github-bot · 2024-07-29T22:40:51Z

@guangy10 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

.github/workflows/android-perf.yml

guangy10 · 2024-07-30T01:07:53Z

.github/workflows/android-perf.yml

+    strategy:
+      matrix:
+        model: ${{ fromJson(needs.set-models.outputs.models) }}
+    with:
+      device-type: android
+      runner: linux.2xlarge
+      test-infra-ref: ''
+      # This is the ARN of ExecuTorch project on AWS
+      project-arn: arn:aws:devicefarm:us-west-2:308535385114:project:02a2cf0f-6d9b-45ee-ba1a-a086587469e6
+      # This is the custom Android device pool that only includes Samsung Galaxy S2x
+      device-pool-arn: arn:aws:devicefarm:us-west-2:308535385114:devicepool:02a2cf0f-6d9b-45ee-ba1a-a086587469e6/e59f866a-30aa-4aa1-87b7-4510e5820dfa
+      # Uploaded to S3 from the previous job, the name of the app comes from the project itself.
+      # Unlike models there are limited numbers of build flavor for apps, and the model controls whether it should build with bpe/tiktoken tokenizer.
+      # It's okay to build all possible apps with all possible flavors in job "build-llm-demo". However, in this job, once a model is given, there is only
+      # one app+flavor that could load and run the model.
+      # TODO: Hard code llm_demo_bpe for now in this job.
+      android-app-archive: https://gha-artifacts.s3.amazonaws.com/${{ github.repository }}/${{ github.run_id }}/artifact/llm_demo_bpe/app-debug.apk
+      android-test-archive: https://gha-artifacts.s3.amazonaws.com/${{ github.repository }}/${{ github.run_id }}/artifact/llm_demo_bpe/app-debug-androidTest.apk
+      # The test spec can be downloaded from https://ossci-assets.s3.amazonaws.com/android-llama2-device-farm-test-spec.yml
+      test-spec: arn:aws:devicefarm:us-west-2:308535385114:upload:02a2cf0f-6d9b-45ee-ba1a-a086587469e6/abd86868-fa63-467e-a5c7-218194665a77
+      # Uploaded to S3 from the previous job
+      extra-data: https://gha-artifacts.s3.amazonaws.com/${{ github.repository }}/${{ github.run_id }}/artifact/${{ matrix.model }}/model.zip


Fyi, I think we should remove tokenizer flavor from the matrix and only add models to it. cc: @huydhn @kirklandsign

guangy10 · 2024-07-30T23:40:10Z

Testing job benchmark-on-device:. It appears that the 'model' is null and 'benchmark-on-device' ends up not being triggered though all its dependency jobs finished successfully.

guangy10 · 2024-07-31T00:24:11Z

Okay. I can see the workflow is scheduled as expected, though the actual benchmarking doesn't make sense due to hard-coded test-spec.

facebook-github-bot · 2024-07-31T00:26:59Z

@guangy10 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

…flow usecase

…h pull_request

guangy10 · 2024-07-31T01:54:35Z

Verified the workflow is scheduled as expected: https://github.com/pytorch/executorch/actions/runs/10172346528/job/28135557119?pr=4441

facebook-github-bot · 2024-07-31T01:56:41Z

@guangy10 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

guangy10 · 2024-07-31T02:08:56Z

I've seen this permission issue very often on different PRs: https://github.com/pytorch/executorch/actions/runs/10172715749/job/28135704037?pr=4441

CC: @kirklandsign @huydhn @kit1980

facebook-github-bot · 2024-07-31T17:57:33Z

@guangy10 merged this pull request in f611219.

huydhn · 2024-07-31T21:24:38Z

I've seen this permission issue very often on different PRs: https://github.com/pytorch/executorch/actions/runs/10172715749/job/28135704037?pr=4441

CC: @kirklandsign @huydhn @kit1980

Hmm, this comes from pytorch/test-infra#5523, @atalman has removed MacOS from there, but I think any runners picked up by the PR during testing would need to be cleaned up.

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 27, 2024

guangy10 force-pushed the ondemand_benchmark branch from 4a4a9ff to ea23d63 Compare July 27, 2024 00:39

guangy10 requested review from kirklandsign and huydhn July 27, 2024 00:52

huydhn reviewed Jul 27, 2024

View reviewed changes

.github/pytorch-probot.yml Outdated Show resolved Hide resolved

huydhn reviewed Jul 27, 2024

View reviewed changes

.github/workflows/android-perf.yml Show resolved Hide resolved

huydhn reviewed Jul 27, 2024

View reviewed changes

.github/workflows/android-perf.yml Show resolved Hide resolved

huydhn reviewed Jul 27, 2024

View reviewed changes

.github/workflows/android-perf.yml Outdated Show resolved Hide resolved

guangy10 commented Jul 29, 2024

View reviewed changes

.github/workflows/android-perf.yml Show resolved Hide resolved

guangy10 commented Jul 29, 2024

View reviewed changes

.github/workflows/android-perf.yml Show resolved Hide resolved

guangy10 force-pushed the ondemand_benchmark branch 9 times, most recently from 27b8d3c to f0fac5b Compare July 29, 2024 18:56

guangy10 commented Jul 29, 2024

View reviewed changes

.github/workflows/android-perf.yml Outdated Show resolved Hide resolved

guangy10 force-pushed the ondemand_benchmark branch from f0fac5b to 4673b32 Compare July 29, 2024 19:15

guangy10 marked this pull request as ready for review July 29, 2024 19:17

guangy10 force-pushed the ondemand_benchmark branch from 4673b32 to 89123ba Compare July 29, 2024 21:14

huydhn reviewed Jul 29, 2024

View reviewed changes

.github/workflows/android-perf.yml Show resolved Hide resolved

huydhn reviewed Jul 29, 2024

View reviewed changes

.github/workflows/android-perf.yml Show resolved Hide resolved

guangy10 force-pushed the ondemand_benchmark branch from 5ec0c8e to 55fe843 Compare July 30, 2024 01:04

guangy10 commented Jul 30, 2024

View reviewed changes

guangy10 force-pushed the ondemand_benchmark branch from 55fe843 to 8f406de Compare July 30, 2024 23:37

huydhn approved these changes Jul 31, 2024

View reviewed changes

guangy10 force-pushed the ondemand_benchmark branch from 8f406de to 4eaf53a Compare July 31, 2024 00:24

guangy10 and others added 6 commits July 30, 2024 18:14

Add workflow for on-demand benchmarking

16dc429

Update android-perf.yml to provide workflow call inputs to fix its ci…

03e4a3b

…flow usecase

Update android-perf.yml to provide the default model when testing wit…

16ef1ce

…h pull_request

Update android-perf.yml to use the default stories110M model

c4e94de

Update android-perf.yml to fix set-output and jq usage

4c66112

Debug set model outputs

577b8db

guangy10 force-pushed the ondemand_benchmark branch 2 times, most recently from 04ac523 to cb02b88 Compare July 31, 2024 01:18

Wrong quotes?

0f670eb

guangy10 force-pushed the ondemand_benchmark branch from cb02b88 to 0f670eb Compare July 31, 2024 01:54

facebook-github-bot closed this in f611219 Jul 31, 2024

facebook-github-bot added the Merged label Jul 31, 2024

kirklandsign deleted the ondemand_benchmark branch August 2, 2024 23:17

kirklandsign restored the ondemand_benchmark branch August 2, 2024 23:17

kirklandsign deleted the ondemand_benchmark branch August 8, 2024 03:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add workflow for on-demand benchmarking #4441

Add workflow for on-demand benchmarking #4441

guangy10 commented Jul 27, 2024 •

edited

Loading

pytorch-bot bot commented Jul 27, 2024 •

edited

Loading

guangy10 commented Jul 27, 2024

guangy10 commented Jul 29, 2024

huydhn commented Jul 29, 2024

huydhn commented Jul 29, 2024

facebook-github-bot commented Jul 29, 2024

guangy10 Jul 30, 2024

guangy10 commented Jul 30, 2024

guangy10 commented Jul 31, 2024

facebook-github-bot commented Jul 31, 2024

guangy10 commented Jul 31, 2024

facebook-github-bot commented Jul 31, 2024

guangy10 commented Jul 31, 2024

facebook-github-bot commented Jul 31, 2024

huydhn commented Jul 31, 2024

Add workflow for on-demand benchmarking #4441

Add workflow for on-demand benchmarking #4441

Conversation

guangy10 commented Jul 27, 2024 • edited Loading

pytorch-bot bot commented Jul 27, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/4441

✅ You can merge normally! (1 Unrelated Failure)

guangy10 commented Jul 27, 2024

guangy10 commented Jul 29, 2024

huydhn commented Jul 29, 2024

huydhn commented Jul 29, 2024

facebook-github-bot commented Jul 29, 2024

guangy10 Jul 30, 2024

Choose a reason for hiding this comment

guangy10 commented Jul 30, 2024

guangy10 commented Jul 31, 2024

facebook-github-bot commented Jul 31, 2024

guangy10 commented Jul 31, 2024

facebook-github-bot commented Jul 31, 2024

guangy10 commented Jul 31, 2024

facebook-github-bot commented Jul 31, 2024

huydhn commented Jul 31, 2024

guangy10 commented Jul 27, 2024 •

edited

Loading

pytorch-bot bot commented Jul 27, 2024 •

edited

Loading