Skip to content

Commit

Permalink
Update iphone 15 pro benchmarking numbers (#2927)
Browse files Browse the repository at this point in the history
Summary:
Pull Request resolved: #2927

ATT

Created from CodeHub with https://fburl.com/edit-in-codehub

Reviewed By: mergennachin

Differential Revision: D55895703

fbshipit-source-id: 5466b44224b8ebf7b88d846354683da0c1f6a801
(cherry picked from commit ce447dc)
  • Loading branch information
kimishpatel authored and pytorchbot committed Apr 9, 2024
1 parent 1f1f357 commit 6ebf490
Showing 1 changed file with 9 additions and 6 deletions.
15 changes: 9 additions & 6 deletions examples/models/llama2/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,15 +32,19 @@ Note that groupsize less than 128 was not enabled, since such model were still t

## Performance

Performance was measured on Samsung Galaxy S22, S23, S24 and One Plus 12. Measurement performance is in terms of tokens/second.
Performance was measured on Samsung Galaxy S22, S24, One Plus 12 and iPhone 15 max Pro. Measurement performance is in terms of tokens/second.

|Device | Groupwise 4-bit (128) | Groupwise 4-bit (256)
|--------| ---------------------- | ---------------
|Galaxy S22 | 8.15 tokens/second | 8.3 tokens/second |
|Galaxy S24 | 10.66 tokens/second | 11.26 tokens/second |
|One plus 12 | 11.55 tokens/second | 11.6 tokens/second |
|iPhone 15 pro | x | x |
|Galaxy S22* | 8.15 tokens/second | 8.3 tokens/second |
|Galaxy S24* | 10.66 tokens/second | 11.26 tokens/second |
|One plus 12* | 11.55 tokens/second | 11.6 tokens/second |
|Galaxy S22** | 5.5 tokens/second | 5.9 tokens/second |
|iPhone 15 pro** | ~6 tokens/second | ~6 tokens/second |

*: Measured via adb binary based [workflow](#step-5-run-benchmark-on)

**: Measured via app based [workflow](#step-6-build-mobile-apps)

# Instructions

Expand Down Expand Up @@ -238,7 +242,6 @@ Please refer to [this tutorial](https://pytorch.org/executorch/main/llm/llama-de
- Enabling LLama2 7b and other architectures via Vulkan
- Enabling performant execution of widely used quantization schemes.

TODO

# Notes
This example tries to reuse the Python code, with minimal modifications to make it compatible with current ExecuTorch:
Expand Down

0 comments on commit 6ebf490

Please sign in to comment.