Skip to content

Conversation

@jacobbohlin
Copy link
Contributor

RFC: apache/tvm-rfcs#37
Issue: #9429

NOTE: This PR builds on top of #9469 and #9471 and therefore includes those changes. This PR will remain as 'draft' until both dependencies are merged.

The algorithm described in the RFC uses two metrics for pareto culling, performance and memory usage. This commit addresses the former and introduces the basis of performance estimation for the Parts. It also includes performance estimation code that is specific to ethosu_conv2d.

The output of the performance model is only meant to be consumed by the cascader.

@jacobbohlin
Copy link
Contributor Author

Comment on lines -71 to 121
for (size_t i = 0; i < block_shape.size(); i++) {
if (!is_rolling) {
num_blocks *= output_stripe_config->GetShape()[i] * output_stripe_config->GetStripes()[i] /
if (buffer_mode == BufferMode::RECOMPUTE) {
num_blocks *= static_cast<float>(output_stripe_config->GetShape()[i] *
output_stripe_config->GetStripes()[i]) /
block_shape[i];
} else {
num_blocks *= output_stripe_config->GetExtent()[i] / block_shape[i];
num_blocks *= static_cast<float>(output_stripe_config->GetExtent()[i]) / block_shape[i];
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to mention that this logic is placeholder and will be replaced in a later patch.

@mbaret
Copy link
Contributor

mbaret commented Jan 12, 2022

Just a quick note on the test coverage of this feature. The results of the performance model are not explicitly tested against the FVP because we don’t have performance instrumentation available in CI. We will however be testing this component downstream where such instrumentation is available.

* Added the pre-computed performance modelling per block.
* Added the aggregation of cycles given a stripe config.
* Implemented the op-specific performance code for conv2d.
* Created a DeviceConfig class to hold constant performance related data
that is dependent on the accelerator configuration
* Added generation of all valid block configs. This is pre-computed and
given as an argument when constructing EthosuParts.
* Implemented selection of the block config that gives the least amount
of data read given a StripeConfig.
@mbaret
Copy link
Contributor

mbaret commented Jan 17, 2022

cc @manupa-arm could you take a look and merge if everything's OK? Thanks

Copy link
Contributor

@manupak manupak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@manupak manupak merged commit 133bb9c into apache:main Jan 17, 2022
@manupak
Copy link
Contributor

manupak commented Jan 17, 2022

Thanks! @jacobbohlin @mbaret

yuanfz98 pushed a commit to yuanfz98/tvm that referenced this pull request Jan 24, 2022
* [microNPU][2c] Initial Performance Model

* Added the pre-computed performance modelling per block.
* Added the aggregation of cycles given a stripe config.
* Implemented the op-specific performance code for conv2d.
* Created a DeviceConfig class to hold constant performance related data
that is dependent on the accelerator configuration
* Added generation of all valid block configs. This is pre-computed and
given as an argument when constructing EthosuParts.
* Implemented selection of the block config that gives the least amount
of data read given a StripeConfig.

* Add test guards

* Extended block config testing
crazydemo pushed a commit to crazydemo/tvm that referenced this pull request Jan 27, 2022
* [microNPU][2c] Initial Performance Model

* Added the pre-computed performance modelling per block.
* Added the aggregation of cycles given a stripe config.
* Implemented the op-specific performance code for conv2d.
* Created a DeviceConfig class to hold constant performance related data
that is dependent on the accelerator configuration
* Added generation of all valid block configs. This is pre-computed and
given as an argument when constructing EthosuParts.
* Implemented selection of the block config that gives the least amount
of data read given a StripeConfig.

* Add test guards

* Extended block config testing
ylc pushed a commit to ylc/tvm that referenced this pull request Feb 16, 2022
* [microNPU][2c] Initial Performance Model

* Added the pre-computed performance modelling per block.
* Added the aggregation of cycles given a stripe config.
* Implemented the op-specific performance code for conv2d.
* Created a DeviceConfig class to hold constant performance related data
that is dependent on the accelerator configuration
* Added generation of all valid block configs. This is pre-computed and
given as an argument when constructing EthosuParts.
* Implemented selection of the block config that gives the least amount
of data read given a StripeConfig.

* Add test guards

* Extended block config testing
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants