Not hardcode llama2 model in perf test #4657

huydhn · 2024-08-10T02:01:41Z

This fixes the hack in #4642. While I wanted to do the fix directly in #4642, it becomes more complicated because the hardcoded path is also used in the instrumented test case. So, I spin this out into a different PR.

This PR also fixes the way the test reports the TPS metric by calling sendStatus to correctly populate instrument log. For example,

INSTRUMENTATION_STATUS: class=com.example.executorchllamademo.PerfTest
INSTRUMENTATION_STATUS: current=1
INSTRUMENTATION_STATUS: id=AndroidJUnitRunner
INSTRUMENTATION_STATUS: numtests=1
INSTRUMENTATION_STATUS: stream=
com.example.executorchllamademo.PerfTest:
INSTRUMENTATION_STATUS: test=testTokensPerSecond
INSTRUMENTATION_STATUS_CODE: 1
INSTRUMENTATION_STATUS: TPS=224.24243 <--- Report here
INSTRUMENTATION_STATUS_CODE: 0
INSTRUMENTATION_STATUS: class=com.example.executorchllamademo.PerfTest
INSTRUMENTATION_STATUS: current=1
INSTRUMENTATION_STATUS: id=AndroidJUnitRunner
INSTRUMENTATION_STATUS: numtests=1
INSTRUMENTATION_STATUS: stream=.
INSTRUMENTATION_STATUS: test=testTokensPerSecond
INSTRUMENTATION_STATUS_CODE: 0
INSTRUMENTATION_RESULT: stream=

Time: 0.678

OK (1 test)


INSTRUMENTATION_CODE: -1

Testing

https://github.com/pytorch/executorch/actions/runs/10332366636

pytorch-bot · 2024-08-10T02:01:43Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/4657

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit d08a5eb with merge base a70d070 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2024-08-10T02:06:14Z

@huydhn has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-08-10T10:00:32Z

@huydhn has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-08-10T10:01:31Z

@huydhn has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-08-10T10:44:04Z

@huydhn has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-08-10T14:31:36Z

@huydhn has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-08-10T18:29:43Z

@huydhn has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

huydhn · 2024-08-10T18:47:44Z

@guangy10 https://github.com/pytorch/executorch/actions/runs/10332366636/job/28606203444 instrument log, i.e. https://gha-artifacts.s3.amazonaws.com/device_farm/10332366636/2/arn_aws_devicefarm_us-west-2_308535385114_artifact_02a2cf0f-6d9b-45ee-ba1a-a086587469e6_8e51b233-2a4c-4380-85f5-7ad1c948805b_00002_00001_00000_00001_Customer_Artifacts.zip now highlight the error with stories110M where it fails to load:

INSTRUMENTATION_STATUS: class=com.example.executorchllamademo.PerfTest
INSTRUMENTATION_STATUS: current=1
INSTRUMENTATION_STATUS: id=AndroidJUnitRunner
INSTRUMENTATION_STATUS: numtests=1
INSTRUMENTATION_STATUS: stream=
com.example.executorchllamademo.PerfTest:
INSTRUMENTATION_STATUS: test=testTokensPerSecond
INSTRUMENTATION_STATUS_CODE: 1
INSTRUMENTATION_STATUS: ModelName=stories110M.pt <-- the model name is now reported here
INSTRUMENTATION_STATUS_CODE: 0
INSTRUMENTATION_STATUS: class=com.example.executorchllamademo.PerfTest
INSTRUMENTATION_STATUS: current=1
INSTRUMENTATION_STATUS: id=AndroidJUnitRunner
INSTRUMENTATION_STATUS: numtests=1
INSTRUMENTATION_STATUS: stack=java.lang.AssertionError: expected:<0> but was:<35>
	at org.junit.Assert.fail(Assert.java:89)
	at org.junit.Assert.failNotEquals(Assert.java:835)
	at org.junit.Assert.assertEquals(Assert.java:647)
	at org.junit.Assert.assertEquals(Assert.java:633)
	at com.example.executorchllamademo.PerfTest.lambda$testTokensPerSecond$1$com-example-executorchllamademo-PerfTest(PerfTest.java:51)
	at com.example.executorchllamademo.PerfTest$$ExternalSyntheticLambda1.accept(Unknown Source:6)
	at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
	at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:185)
	at java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948)
	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:485)
	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:475)
	at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
	at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:133)
	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:236)
	at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:435)
	at com.example.executorchllamademo.PerfTest.testTokensPerSecond(PerfTest.java:43)

@kirklandsign would probably know what this error means.

guangy10 · 2024-08-12T16:01:34Z

examples/demo-apps/android/LlamaDemo/android-llama2-device-farm-test-spec.yml

@@ -11,10 +11,12 @@ phases:
      # Prepare the model and the tokenizer
      - adb -s $DEVICEFARM_DEVICE_UDID shell "ls -la /sdcard/"
      - adb -s $DEVICEFARM_DEVICE_UDID shell "mkdir -p /data/local/tmp/llama/"
-      - adb -s $DEVICEFARM_DEVICE_UDID shell "mv /sdcard/tokenizer.bin /data/local/tmp/llama/tokenizer.bin"


Oh I just notice that this file is actually has a local copy stored in the demo-apps dir! Curious to know how it is uploaded to s3 so that the link https://ossci-assets.s3.amazonaws.com/android-llama2-device-farm-test-spec-v2.yml could work?

Oh, the upload is still done manually, I plan to write a workflow to automatically upload it to S3 in the next PR, so that we can just update this file

guangy10 · 2024-08-12T16:02:23Z

.github/workflows/android.yml

-      # The test spec can be downloaded from https://ossci-assets.s3.amazonaws.com/android-llama2-device-farm-test-spec.yml
-      test-spec: arn:aws:devicefarm:us-west-2:308535385114:upload:02a2cf0f-6d9b-45ee-ba1a-a086587469e6/abd86868-fa63-467e-a5c7-218194665a77
+      # The test spec can be downloaded from https://ossci-assets.s3.amazonaws.com/android-llama2-device-farm-test-spec-v2.yml
+      test-spec: https://ossci-assets.s3.amazonaws.com/android-llama2-device-farm-test-spec-v2.yml


Can we rename it to something more generic, i.e. android-llm-device-farm-test-spec-v2.yml?

guangy10 · 2024-08-12T16:04:23Z

examples/demo-apps/android/LlamaDemo/android-llama2-device-farm-test-spec.yml

+      - adb -s $DEVICEFARM_DEVICE_UDID shell "mv /sdcard/*.pt /data/local/tmp/llama/"
+      - adb -s $DEVICEFARM_DEVICE_UDID shell "chmod 664 /data/local/tmp/llama/*.bin"
+      - adb -s $DEVICEFARM_DEVICE_UDID shell "chmod 664 /data/local/tmp/llama/*.pte"
+      - adb -s $DEVICEFARM_DEVICE_UDID shell "chmod 664 /data/local/tmp/llama/*.pt"


We probably won't need this line.

Is there a bug here in .ci/scripts/test_llama.sh script? I only see 2 files stories110M.pt and the tokenizer, i.e. https://gha-artifacts.s3.amazonaws.com/pytorch/executorch/10332366636/artifact/stories110M_xnnpack/model.zip? That's why I attempt to add the *.pt file here

guangy10 · 2024-08-12T16:05:28Z

...pps/android/LlamaDemo/app/src/androidTest/java/com/example/executorchllamademo/PerfTest.java

+              int loadResult = mModule.load();
+              // Check that the model can be load successfully
+              assertEquals(0, loadResult);


guangy10 · 2024-08-12T16:06:55Z

...pps/android/LlamaDemo/app/src/androidTest/java/com/example/executorchllamademo/PerfTest.java

+    // Find out the model name
+    File directory = new File(RESOURCE_PATH);
+    Arrays.stream(directory.listFiles())
+        .filter(file -> file.getName().endsWith(".pte") || file.getName().endsWith(".pt"))


We don't need .pt file, it's the suffix of the weights for stories110M

I see, let me remove this, but it looks like something is not right where the https://gha-artifacts.s3.amazonaws.com/pytorch/executorch/10332366636/artifact/stories110M_xnnpack/model.zip contains only stories110M.pt and the tokenizer. Should there be an exported stories110M.pte model there? Maybe this explain why it fails to load.

yeah, the fix is in #4642

So you merge the test spec fix and ignore the "failed to load model" issue

yeah, the fix is in #4642

Oh, got it, let me push a commit to fix the other comments and land this PR

guangy10

Thanks for the fix. Make sure the comments are addressed before merging

guangy10 · 2024-08-12T17:36:47Z

LGTM. @kirklandsign may want to review the app as he is working on integrating a new activity.

facebook-github-bot · 2024-08-12T17:57:36Z

@huydhn has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

guangy10 · 2024-08-12T17:58:51Z

examples/demo-apps/android/LlamaDemo/android-llm-device-farm-test-spec.yml

+      - adb -s $DEVICEFARM_DEVICE_UDID shell "mv /sdcard/*.bin /data/local/tmp/llama/"
+      - adb -s $DEVICEFARM_DEVICE_UDID shell "mv /sdcard/*.pte /data/local/tmp/llama/"
+      - adb -s $DEVICEFARM_DEVICE_UDID shell "chmod 664 /data/local/tmp/llama/*.bin"
+      - adb -s $DEVICEFARM_DEVICE_UDID shell "chmod 664 /data/local/tmp/llama/*.pte"
      - adb -s $DEVICEFARM_DEVICE_UDID shell "ls -la /data/local/tmp/llama/"


Replace "llama" with "llm" for those paths as well.

Oh, I remember this path /data/local/tmp/llama/ was hard-coded in the app before. If that's not the case anymore, I could update it, maybe after #4676 lands to avoid the need to upload the spec manually

huydhn added 2 commits August 9, 2024 18:55

Not hardcode llama2 model in perf test

d87872b

Also include the new spec

820b50b

huydhn requested review from guangy10 and kirklandsign August 10, 2024 02:01

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 10, 2024

huydhn marked this pull request as ready for review August 10, 2024 02:05

huydhn mentioned this pull request Aug 10, 2024

Fix periodic run and model name for benchmarking #4642

Merged

huydhn added 8 commits August 9, 2024 21:45

Debug

b64f631

Fix compilation error

6d41809

Debug print

b92e471

Merge branch 'main' into hack-the-tps-value

55024d8

Attempt to use sendStatus

3f69c55

Report TPS via instrument status

3c084ec

Just use the spec from S3

9372c8f

It's working

8a8aefe

huydhn added 2 commits August 10, 2024 03:38

Also print the model name

56508c4

Fix lint

51dfaf3

huydhn added 2 commits August 10, 2024 03:48

Minor tweak

2eff98c

Fix internal lint

1c752fb

huydhn added 3 commits August 10, 2024 07:35

Update spec one more time

47e413b

Check for passing test last

c8b5c94

Lint yet

2d3c2c0

guangy10 reviewed Aug 12, 2024

View reviewed changes

guangy10 approved these changes Aug 12, 2024

View reviewed changes

kirklandsign approved these changes Aug 12, 2024

View reviewed changes

Address review comments

d08a5eb

guangy10 reviewed Aug 12, 2024

View reviewed changes

facebook-github-bot merged commit d53f8fa into main Aug 12, 2024
49 checks passed

huydhn mentioned this pull request Aug 12, 2024

Upload Android test spec to ossci-android #4676

Merged

huydhn deleted the hack-the-tps-value branch December 30, 2024 20:19

Not hardcode llama2 model in perf test #4657

Not hardcode llama2 model in perf test #4657

Uh oh!

Conversation

huydhn commented Aug 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Testing

Uh oh!

pytorch-bot bot commented Aug 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/4657

✅ No Failures

Uh oh!

facebook-github-bot commented Aug 10, 2024

Uh oh!

facebook-github-bot commented Aug 10, 2024

Uh oh!

facebook-github-bot commented Aug 10, 2024

Uh oh!

facebook-github-bot commented Aug 10, 2024

Uh oh!

facebook-github-bot commented Aug 10, 2024

Uh oh!

facebook-github-bot commented Aug 10, 2024

Uh oh!

huydhn commented Aug 10, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

huydhn Aug 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

huydhn Aug 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

guangy10 left a comment

Choose a reason for hiding this comment

Uh oh!

guangy10 commented Aug 12, 2024

Uh oh!

facebook-github-bot commented Aug 12, 2024

Uh oh!

guangy10 Aug 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

huydhn commented Aug 10, 2024 •

edited

Loading

pytorch-bot bot commented Aug 10, 2024 •

edited

Loading

huydhn Aug 12, 2024 •

edited

Loading

huydhn Aug 12, 2024 •

edited

Loading

guangy10 Aug 12, 2024 •

edited

Loading