-
Notifications
You must be signed in to change notification settings - Fork 3.8k
[microNPU][2c] Add performance modelling to cascader #9778
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
77a5b22 to
366aa80
Compare
1124baf to
35d9164
Compare
35d9164 to
0fa664d
Compare
6366031 to
426d0ae
Compare
426d0ae to
c4a4b5a
Compare
| for (size_t i = 0; i < block_shape.size(); i++) { | ||
| if (!is_rolling) { | ||
| num_blocks *= output_stripe_config->GetShape()[i] * output_stripe_config->GetStripes()[i] / | ||
| if (buffer_mode == BufferMode::RECOMPUTE) { | ||
| num_blocks *= static_cast<float>(output_stripe_config->GetShape()[i] * | ||
| output_stripe_config->GetStripes()[i]) / | ||
| block_shape[i]; | ||
| } else { | ||
| num_blocks *= output_stripe_config->GetExtent()[i] / block_shape[i]; | ||
| num_blocks *= static_cast<float>(output_stripe_config->GetExtent()[i]) / block_shape[i]; | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to mention that this logic is placeholder and will be replaced in a later patch.
|
Just a quick note on the test coverage of this feature. The results of the performance model are not explicitly tested against the FVP because we don’t have performance instrumentation available in CI. We will however be testing this component downstream where such instrumentation is available. |
* Added the pre-computed performance modelling per block. * Added the aggregation of cycles given a stripe config. * Implemented the op-specific performance code for conv2d. * Created a DeviceConfig class to hold constant performance related data that is dependent on the accelerator configuration * Added generation of all valid block configs. This is pre-computed and given as an argument when constructing EthosuParts. * Implemented selection of the block config that gives the least amount of data read given a StripeConfig.
be05493 to
e1daf76
Compare
|
cc @manupa-arm could you take a look and merge if everything's OK? Thanks |
manupak
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
Thanks! @jacobbohlin @mbaret |
* [microNPU][2c] Initial Performance Model * Added the pre-computed performance modelling per block. * Added the aggregation of cycles given a stripe config. * Implemented the op-specific performance code for conv2d. * Created a DeviceConfig class to hold constant performance related data that is dependent on the accelerator configuration * Added generation of all valid block configs. This is pre-computed and given as an argument when constructing EthosuParts. * Implemented selection of the block config that gives the least amount of data read given a StripeConfig. * Add test guards * Extended block config testing
* [microNPU][2c] Initial Performance Model * Added the pre-computed performance modelling per block. * Added the aggregation of cycles given a stripe config. * Implemented the op-specific performance code for conv2d. * Created a DeviceConfig class to hold constant performance related data that is dependent on the accelerator configuration * Added generation of all valid block configs. This is pre-computed and given as an argument when constructing EthosuParts. * Implemented selection of the block config that gives the least amount of data read given a StripeConfig. * Add test guards * Extended block config testing
* [microNPU][2c] Initial Performance Model * Added the pre-computed performance modelling per block. * Added the aggregation of cycles given a stripe config. * Implemented the op-specific performance code for conv2d. * Created a DeviceConfig class to hold constant performance related data that is dependent on the accelerator configuration * Added generation of all valid block configs. This is pre-computed and given as an argument when constructing EthosuParts. * Implemented selection of the block config that gives the least amount of data read given a StripeConfig. * Add test guards * Extended block config testing
RFC: apache/tvm-rfcs#37
Issue: #9429
NOTE: This PR builds on top of #9469 and #9471 and therefore includes those changes. This PR will remain as 'draft' until both dependencies are merged.
The algorithm described in the RFC uses two metrics for pareto culling, performance and memory usage. This commit addresses the former and introduces the basis of performance estimation for the Parts. It also includes performance estimation code that is specific to ethosu_conv2d.
The output of the performance model is only meant to be consumed by the cascader.