[SYCL-MLIR] Add KernelDisjointSpecialization
pass in pipeline
#9187
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Verified that with this PR and
-D__SYCL_DISABLE_PARALLEL_FOR_RANGE_ROUNDING__
, SYCL-MLIR is able to perform scalar replacement on reduction loop on the SYCL-Bench workloads identified before, even with the fix in #9055.=> No performance regressions with just this PR.
=> Lost
covariance
50% gain after specializing the function, because 2 of the 3 accessors actually overlap!Not sure why it is written that way, with the simple source code change below (which remove
symmat2
and always usesymmat
), the 50% gain is recovered.If we want to get the gain without source code change, then we need to version by only checking if
symmat
anddata
overlap, which we need the context, we may want to perform loop versioning in detect reduction pass.Notice the inner loop with the opportunity doesn't use
symmat2
:Note: A number of KernelFusion test cases are moved to xfail, as accessors cannot be internalize when they are used by a ptrtoint operation. (#9188)