You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
dtype selective build from model API in OSS (#11760)
Given a specific model, we want to produce a binary that only includes
the minimal operators and dtypes needed to run the model. This requires
parsing the model to determine what kernels it launch, the operators
used in those kernels, and the dtypes of the tensors in the kernels.
After parsing, a header file must be generated, and the portable_kernels
lib can be rebuilt to only include the operators and dtypes specified in
the generated header.
This changes completes this E2E process. A user can now specify the
model they wish to optimize their binary for via the command line
argument `-DEXECUTORCH_SELECT_OPS_FROM_MODEL="<file path to model
pte>"`. When specified, the pte is parsed to produce a YAML file called
`seleced_operators.yaml` which describes the model's operators and
dtypes. From this YAML, a header file called `selected_op_variants.h` is
generated that selects the described operators and dtypes. When command
line argument `-DEXECUTORCH_DTYPE_SELECTIVE_BUILD=ON` is specified, the
header file is linked to the `portable_kernels` lib when it's rebuilt.
Only the model API is supported with dtype selective build, and using
other methods such as `list` or `dict` will results in a build error.
An example usage of this flow is included in
`examples/selective_build/test_selective_build.sh:test_cmake_select_ops_in_model`.
When run as `bash examples/selective_build/test_selective_build.sh
cmake`, the `cmake-out/examples/selective_build/selective_build_test`
binary is built. After stripping the binary, the following binary size
results were seen with the following models:
| Model | Default Binary Size (KB) | Dtype Selected Binary Size (KB) |
| ------- | :---:| :---:|
|add | 359 | 275|
|mul | 335 | 263 |
| add_mul| 367 | 287 |
| linear | 347 | 291 |
| softmax | 251 | 251 |
|resnet18| 643 | 515 |
|resnet50 | 643 | 515 |
|mobilebert| 707 | 415 |
|lstm| 643 | 459|
| dl3| 607 | 539|
|edsr | 543 | 371|
| Model | Default Binary Size (KB) | Dtype Selected Binary Size (KB) |
| ------- | :---:| :---:|
|emformer_transcribe| 863| 555|
|vit| 907| 687 |
|mv2| 631 | 495 |
|mv3| 843 | 535 |
|llama | 1.1M | 827|
|qwen2_5 | 1.1M | 827|
Although there is a noted reduction in the binary size, it seems that
the pte file parsing functionality from `gen_oplist.py` is incomplete.
Please see the discussion on [PR
details.
0 commit comments