-
Notifications
You must be signed in to change notification settings - Fork 355
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add workflow for on-demand benchmarking #4441
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/4441
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (1 Unrelated Failure)As of commit 0f670eb with merge base 227b49d (): BROKEN TRUNK - The following job failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
4a4a9ff
to
ea23d63
Compare
@huydhn how can I test this workflow?
|
27b8d3c
to
f0fac5b
Compare
f0fac5b
to
4673b32
Compare
@huydhn Do I need to get the PR merged in order to test it from the GitHub Action UI? In the WIP PR, I can find this workflow but it seems no way to trigger a run for it |
For testing, I will just add |
4673b32
to
89123ba
Compare
I just realize that triggering the workflow using |
@guangy10 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
5ec0c8e
to
55fe843
Compare
strategy: | ||
matrix: | ||
model: ${{ fromJson(needs.set-models.outputs.models) }} | ||
with: | ||
device-type: android | ||
runner: linux.2xlarge | ||
test-infra-ref: '' | ||
# This is the ARN of ExecuTorch project on AWS | ||
project-arn: arn:aws:devicefarm:us-west-2:308535385114:project:02a2cf0f-6d9b-45ee-ba1a-a086587469e6 | ||
# This is the custom Android device pool that only includes Samsung Galaxy S2x | ||
device-pool-arn: arn:aws:devicefarm:us-west-2:308535385114:devicepool:02a2cf0f-6d9b-45ee-ba1a-a086587469e6/e59f866a-30aa-4aa1-87b7-4510e5820dfa | ||
# Uploaded to S3 from the previous job, the name of the app comes from the project itself. | ||
# Unlike models there are limited numbers of build flavor for apps, and the model controls whether it should build with bpe/tiktoken tokenizer. | ||
# It's okay to build all possible apps with all possible flavors in job "build-llm-demo". However, in this job, once a model is given, there is only | ||
# one app+flavor that could load and run the model. | ||
# TODO: Hard code llm_demo_bpe for now in this job. | ||
android-app-archive: https://gha-artifacts.s3.amazonaws.com/${{ github.repository }}/${{ github.run_id }}/artifact/llm_demo_bpe/app-debug.apk | ||
android-test-archive: https://gha-artifacts.s3.amazonaws.com/${{ github.repository }}/${{ github.run_id }}/artifact/llm_demo_bpe/app-debug-androidTest.apk | ||
# The test spec can be downloaded from https://ossci-assets.s3.amazonaws.com/android-llama2-device-farm-test-spec.yml | ||
test-spec: arn:aws:devicefarm:us-west-2:308535385114:upload:02a2cf0f-6d9b-45ee-ba1a-a086587469e6/abd86868-fa63-467e-a5c7-218194665a77 | ||
# Uploaded to S3 from the previous job | ||
extra-data: https://gha-artifacts.s3.amazonaws.com/${{ github.repository }}/${{ github.run_id }}/artifact/${{ matrix.model }}/model.zip |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fyi, I think we should remove tokenizer flavor from the matrix and only add models to it. cc: @huydhn @kirklandsign
55fe843
to
8f406de
Compare
Testing job |
Okay. I can see the workflow is scheduled as expected, though the actual benchmarking doesn't make sense due to hard-coded test-spec. |
8f406de
to
4eaf53a
Compare
@guangy10 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
04ac523
to
cb02b88
Compare
cb02b88
to
0f670eb
Compare
Verified the workflow is scheduled as expected: https://github.com/pytorch/executorch/actions/runs/10172346528/job/28135557119?pr=4441 |
@guangy10 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
I've seen this permission issue very often on different PRs: https://github.com/pytorch/executorch/actions/runs/10172715749/job/28135704037?pr=4441 |
Hmm, this comes from pytorch/test-infra#5523, @atalman has removed MacOS from there, but I think any runners picked up by the PR during testing would need to be cleaned up. |
Ability to schedule an on-demand benchmark job from GA UI with params, e.g. models, delegates, devices, etc
Ability to schedule from PR via tagging (doubt it could work with non-default args)