[Relay][Strategy] Use x86 dense schedules for arm_cpu #15470
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Currently the fallback used when compiling a dense operation with targets such as
llvm -device=arm_cpu
isdense.generic
. This results in very poor performance. Although #13775 meant that x86 schedules are used in cases where no strategy is provided by arm_cpu, the dense strategy is registered due to the existence of specialized schedules for arm_cpu e.g. a schedule for embedded devices. This commit ensures x86 schedules are used inplace of a generic schedule which yields much better performance.The commit also follows the same approach for the
dense.generic
schedule as the x86 strategy. This will only be used when auto-scheduler is enabled.A test has been added to check the intended schedules are picked when compiling with
arm_cpu
.cc @ekalda @neildhickey