[SYCL-MLIR] Add `KernelDisjointSpecialization` pass in pipeline #9187

whitneywhtsang · 2023-04-24T22:57:12Z

Verified that with this PR and -D__SYCL_DISABLE_PARALLEL_FOR_RANGE_ROUNDING__, SYCL-MLIR is able to perform scalar replacement on reduction loop on the SYCL-Bench workloads identified before, even with the fix in #9055.

O3	Previously measured gains	Only PR9187	PR9187+PR9055
2mm	12%	12%	12%
3mm	13%	12%	11%
covariance	50%	50%	0%
gemm	12%	12%	11%
gramschmidt	14% + 18% (LICM versioning)	33%	33%
syrk	5%	5%	5%

=> No performance regressions with just this PR.
=> Lost covariance 50% gain after specializing the function, because 2 of the 3 accessors actually overlap!

auto data = data_buffer.get_access<access::mode::read>(cgh);
auto symmat = symmat_buffer.get_access<access::mode::discard_write>(cgh);
auto symmat2 = symmat_buffer.get_access<access::mode::discard_write>(cgh);

Not sure why it is written that way, with the simple source code change below (which remove symmat2 and always use symmat), the 50% gain is recovered.

+++ b/polybench/covariance.cpp
@@ -110,7 +110,6 @@ public:
-                       auto symmat2 = symmat_buffer.get_access<access::mode::discard_write>(cgh);
@@ -122,7 +121,7 @@ public:
-                                       symmat2[{j2, j1}] = symmat[{j1, j2}];
+                                       symmat[{j2, j1}] = symmat[{j1, j2}];

If we want to get the gain without source code change, then we need to version by only checking if symmat and data overlap, which we need the context, we may want to perform loop versioning in detect reduction pass.
Notice the inner loop with the opportunity doesn't use symmat2:

for(size_t i = 1; i <= N_; i++)
  symmat[{j1, j2}] += data[{i, j1}] * data[{i, j2}];

Note: A number of KernelFusion test cases are moved to xfail, as accessors cannot be internalize when they are used by a ptrtoint operation. (#9188)

polygeist/lib/Dialect/Polygeist/Transforms/KernelDisjointSpecialization.cpp

polygeist/test/polygeist-opt/sycl/kernel_disjoint_specialization.mlir

polygeist/lib/Dialect/Polygeist/Utils/TransformUtils.cpp

etiotto · 2023-04-25T13:30:58Z

"If we want to get the gain without source code change, then we need to version by only checking if symmat and data overlap, which we need the context, we may want to perform loop versioning in detect reduction pass."

Yes I think we need to version the reduction loop to get the opportunity.

Signed-off-by: Tsang, Whitney <whitney.tsang@intel.com>

whitneywhtsang · 2023-04-26T00:06:26Z

After allowing functions indirectly called by GPU kernel in #9194, there are more KernelFusion tests failed due to unable to internalized. There are a total of 15 KernelFusion test cases moved to xfail.

whitneywhtsang added the sycl-mlir Pull requests or issues for sycl-mlir branch label Apr 24, 2023

whitneywhtsang requested review from sommerlukas and victor-eds April 24, 2023 22:57

whitneywhtsang self-assigned this Apr 24, 2023

whitneywhtsang requested a review from etiotto as a code owner April 24, 2023 22:57

whitneywhtsang mentioned this pull request Apr 24, 2023

[SYCL-MLIR] KernelFusion unable to perform internalization with KernelDisjointSpecialization #9188

Open

sommerlukas approved these changes Apr 25, 2023

View reviewed changes

victor-eds reviewed Apr 25, 2023

View reviewed changes

whitneywhtsang requested a review from victor-eds April 25, 2023 13:01

etiotto approved these changes Apr 25, 2023

View reviewed changes

[SYCL-MLIR] Add KernelDisjointSpecialization in pipeline

497e738

Signed-off-by: Tsang, Whitney <whitney.tsang@intel.com>

whitneywhtsang force-pushed the kernel-disjoint-specialization branch from 52aff1d to 497e738 Compare April 25, 2023 16:52

Add more KernelFusion tests to xfail

263cc7c

Signed-off-by: Tsang, Whitney <whitney.tsang@intel.com>

This was referenced Apr 26, 2023

[SYCL-MLIR] Add polygeist.struct #9213

Open

[SYCL-MLIR][DetectReduction] Add loop versioning #9214

Open

victor-eds approved these changes Apr 26, 2023

View reviewed changes

whitneywhtsang merged commit ee50932 into intel:sycl-mlir Apr 26, 2023

whitneywhtsang deleted the kernel-disjoint-specialization branch April 26, 2023 13:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SYCL-MLIR] Add `KernelDisjointSpecialization` pass in pipeline #9187

[SYCL-MLIR] Add `KernelDisjointSpecialization` pass in pipeline #9187

Uh oh!

whitneywhtsang commented Apr 24, 2023 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

etiotto commented Apr 25, 2023

Uh oh!

whitneywhtsang commented Apr 26, 2023 •

edited

Loading

Uh oh!

Uh oh!

[SYCL-MLIR] Add KernelDisjointSpecialization pass in pipeline #9187

[SYCL-MLIR] Add KernelDisjointSpecialization pass in pipeline #9187

Uh oh!

Conversation

whitneywhtsang commented Apr 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

etiotto commented Apr 25, 2023

Uh oh!

whitneywhtsang commented Apr 26, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

[SYCL-MLIR] Add `KernelDisjointSpecialization` pass in pipeline #9187

[SYCL-MLIR] Add `KernelDisjointSpecialization` pass in pipeline #9187

whitneywhtsang commented Apr 24, 2023 •

edited

Loading

whitneywhtsang commented Apr 26, 2023 •

edited

Loading