-
Notifications
You must be signed in to change notification settings - Fork 586
Experiment with private rooted Pixel 3 devices #10192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/10192
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit e0b834c with merge base 4559a61 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
id-token: write | ||
contents: read | ||
with: | ||
models: ${{ inputs.models }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@guangy10 @kirklandsign Appreciate if you could pass me the list of models and benchmark configs that you want to cover here. We have only 2 devices there though, so I think let's reduce the number of model we want to cover for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's pick one non-genAI model (mv3, xnnpack_q8) and one genAI model (llama3.2-1b, spinquant & qlora)
mv3,meta-llama/Llama-3.2-1B-Instruct-SpinQuant_INT4_EO8,meta-llama/Llama-3.2-1B-Instruct-QLORA_INT4_EO8
contents: read | ||
with: | ||
models: ${{ inputs.models }} | ||
devices: google_pixel_3_private_rooted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately we don't have data points of using the same type of device on public pool, can't do any comparison, so we would mainly rely on the new data on the private & rooted device and compare those across different runs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recall @kirklandsign mentioned that the metrics instability is frequently seen on nightly batched jobs, on-demand with only 1-2 models are running much stable.
@huydhn In this new schedule, and to gather more data points, we may want to run this workflow more often (every 2 hours maybe? Should have enough time to cooldown from previous run)
@huydhn Do we need to merge this PR? I thought we can reuse the existing workflow instead of creating a new fork
This uses the existing workflow but gives us the flexibility to use a different schedule and default inputs |
I missed this in #10192 for the schedule workflow case. So, it wrongly fail back to `llama` ### Testing https://github.com/pytorch/executorch/actions/runs/14481092309. The results shows up on HUD correctly now https://hud.pytorch.org/benchmark/llms?startTime=Wed%2C%2009%20Apr%202025%2000%3A31%3A41%20GMT&stopTime=Wed%2C%2016%20Apr%202025%2000%3A31%3A41%20GMT&granularity=day&lBranch=update-default-private-devices&lCommit=9fa6a49250fdbd96e8f7a5f4765bc6b32b41b6ce&rBranch=update-default-private-devices&rCommit=9fa6a49250fdbd96e8f7a5f4765bc6b32b41b6ce&repoName=pytorch%2Fexecutorch&benchmarkName=&modelName=All%20Models&backendName=All%20Backends&modeName=All%20Modes&dtypeName=All%20DType&deviceName=Google%20Pixel%203%20(Rooted)%20(Android%2012)&archName=All%20Platforms
I'm creating a new experimental workflow to run benchmarks on private rooted Pixel 3 devices. We have only 2 devices there though, so I think let's reduce the number of model we want to cover for now. ### Testing https://github.com/pytorch/executorch/actions/runs/14465442714
) I missed this in pytorch#10192 for the schedule workflow case. So, it wrongly fail back to `llama` ### Testing https://github.com/pytorch/executorch/actions/runs/14481092309. The results shows up on HUD correctly now https://hud.pytorch.org/benchmark/llms?startTime=Wed%2C%2009%20Apr%202025%2000%3A31%3A41%20GMT&stopTime=Wed%2C%2016%20Apr%202025%2000%3A31%3A41%20GMT&granularity=day&lBranch=update-default-private-devices&lCommit=9fa6a49250fdbd96e8f7a5f4765bc6b32b41b6ce&rBranch=update-default-private-devices&rCommit=9fa6a49250fdbd96e8f7a5f4765bc6b32b41b6ce&repoName=pytorch%2Fexecutorch&benchmarkName=&modelName=All%20Models&backendName=All%20Backends&modeName=All%20Modes&dtypeName=All%20DType&deviceName=Google%20Pixel%203%20(Rooted)%20(Android%2012)&archName=All%20Platforms
Same as #10192, but this is for the private iPhone 15 devices. I will need to follow on this with another PR to get the device pool name / id and maybe the device id ### Testing https://hud.pytorch.org/benchmark/llms?startTime=Wed%2C%2016%20Apr%202025%2004%3A22%3A32%20GMT&stopTime=Wed%2C%2023%20Apr%202025%2004%3A22%3A32%20GMT&granularity=day&lBranch=apple-private-devices&lCommit=be72291c9b643581fedafaf8711d07fd5542b62d&rBranch=apple-private-devices&rCommit=be72291c9b643581fedafaf8711d07fd5542b62d&repoName=pytorch%2Fexecutorch&benchmarkName=&modelName=All%20Models&backendName=All%20Backends&modeName=All%20Modes&dtypeName=All%20DType&deviceName=All%20Devices&archName=All%20Platforms
I'm creating a new experimental workflow to run benchmarks on private rooted Pixel 3 devices. We have only 2 devices there though, so I think let's reduce the number of model we want to cover for now.
Testing
https://github.com/pytorch/executorch/actions/runs/14465442714