Skip to content

[SYCL] Switch SPIR-V offload target to opaque pointers #9828

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Aug 8, 2023

Conversation

bader
Copy link
Contributor

@bader bader commented Jun 12, 2023

Although there are a few tests failing due to this change, we need to go with this change to avoid future regressions and unblock changes removing typed pointers support. The regressions are supposed to be fixed by follow-up patches.

@bader bader temporarily deployed to aws June 13, 2023 00:05 — with GitHub Actions Inactive
@bader bader temporarily deployed to aws June 13, 2023 00:42 — with GitHub Actions Inactive
@bader bader temporarily deployed to aws June 15, 2023 17:16 — with GitHub Actions Inactive
@bader bader temporarily deployed to aws June 15, 2023 18:23 — with GitHub Actions Inactive
@bader bader temporarily deployed to aws June 15, 2023 19:16 — with GitHub Actions Inactive
@bader bader temporarily deployed to aws July 21, 2023 01:58 — with GitHub Actions Inactive
@bader bader temporarily deployed to aws July 21, 2023 02:36 — with GitHub Actions Inactive
@bader bader temporarily deployed to aws July 27, 2023 00:31 — with GitHub Actions Inactive
@bader bader temporarily deployed to aws July 27, 2023 01:38 — with GitHub Actions Inactive
@bader bader temporarily deployed to aws August 1, 2023 20:18 — with GitHub Actions Inactive
@bader bader temporarily deployed to aws August 1, 2023 21:15 — with GitHub Actions Inactive
@bader bader temporarily deployed to aws August 1, 2023 21:43 — with GitHub Actions Inactive
@bader bader temporarily deployed to aws August 1, 2023 22:51 — with GitHub Actions Inactive
@bader bader temporarily deployed to aws August 2, 2023 16:15 — with GitHub Actions Inactive
@bader bader temporarily deployed to aws August 2, 2023 17:01 — with GitHub Actions Inactive
@bader bader changed the title [SYCL] Run SYCL end-to-end with opaque pointers [SYCL] Switch SPIR-V offload target to opaque pointers Aug 2, 2023
@jsji
Copy link
Contributor

jsji commented Aug 3, 2023

So any use of the SYCL vector math functions in a ESIMD kernel seems to cause the problem.

Yes, ext_math is the test in question, and the issue was in the GPU RT. It should be fixed with intel/intel-graphics-compiler@c083dec which made it into igc 1.0.14342.

@sarnex Is this problem happen only with the OLD driver? updating the igc will fix these? Or they will fail even with updated igc?

@sarnex
Copy link
Contributor

sarnex commented Aug 3, 2023

@jsji A driver with IGC 1.0.14342 or later will not have the problem and the test will pass.

@jsji
Copy link
Contributor

jsji commented Aug 3, 2023

@jsji A driver with IGC 1.0.14342 or later will not have the problem and the test will pass.

Great to hear that! Then let us try to upgrade the IGC driver in dependencies and CI.

@sarnex
Copy link
Contributor

sarnex commented Aug 3, 2023

@jsji Sounds good, let me know if you hit any problems with this test on a new driver and I will investigate.

@jsji
Copy link
Contributor

jsji commented Aug 3, 2023

@bader @aelovikov-intel @intel/dpcpp-devops-reviewers Can we try to update the IGC version?

@v-klochkov If we can't update the IGC version due to some other reason for now, can we XFAIL the test as @sarnex confirmed that this is just due to old driver. We will eventually update the IGC version anyhow.

@sarnex
Copy link
Contributor

sarnex commented Aug 3, 2023

@jsji The problem is not really about this test in particular, the problem is basic ESIMD functionality will break for people not using the latest drivers. Even in internal testing we don't seem to be using a driver that has the fix.

@bader
Copy link
Contributor Author

bader commented Aug 3, 2023

@jsji The problem is not really about this test in particular, the problem is basic ESIMD functionality will break for people not using the latest drivers. Even in internal testing we don't seem to be using a driver that has the fix.

I have concern that only one test has failed, which seems to test something different. Don't we have ESIMD tests checking "basic functionality"? I would recommend carefully reviewing ESIMD tests as from my POV the ROI of current tests is low.

@bader
Copy link
Contributor Author

bader commented Aug 3, 2023

@bader @aelovikov-intel @intel/dpcpp-devops-reviewers Can we try to update the IGC version?

I have no objections to check, but we need to test if using custom IGC version doesn't break other tests. We use the version specified by the GPU driver - https://github.com/intel/compute-runtime/releases, which is tested with the rest of the driver components.
@aelovikov-intel, could you run as much tests as possible with the IGC 1.0.14342 or later before the switch? I used to use regular pre-commit, but AFAIK, it doesn't work for testing changes in CI anymore.

@sarnex
Copy link
Contributor

sarnex commented Aug 3, 2023

@bader It seems we only have one test that uses the vector sycl math functions and that test makes use of them extensively. By basic functionality, I meant that it is extremely easy to hit this problem. If they use the functions at all, even in trivial cases, they will hit the problem. I don't actually know how often these functions are used by customers, maybe @v-klochkov has more customer insight.

@jsji
Copy link
Contributor

jsji commented Aug 3, 2023

Don't we have ESIMD tests checking "basic functionality"? I would recommend carefully reviewing ESIMD tests as from my POV the ROI of current tests is low.

Yes, agree! But I think this shouldn't be a blocker for opaque pointer enablement.

@jsji
Copy link
Contributor

jsji commented Aug 3, 2023

@jsji The problem is not really about this test in particular, the problem is basic ESIMD functionality will break for people not using the latest drivers. Even in internal testing we don't seem to be using a driver that has the fix.

In theory, when we release the changes , we can require NEW drivers. But yes, we will need to test and update the new drivers in all our relevant tests including internal tests. And also involve the release managers.

@aelovikov-intel
Copy link
Contributor

@aelovikov-intel, could you run as much tests as possible with the IGC 1.0.14342 or later before the switch? I used to use regular pre-commit, but AFAIK, it doesn't work for testing changes in CI anymore.

Driver update isn't considered to be CI changes. You should be able to test it.

Copy link
Contributor

@sommerlukas sommerlukas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the kernel fusion test, makes sense to deactivate for now.

We'll investigate the issue and address in a separate PR.

@bader
Copy link
Contributor Author

bader commented Aug 3, 2023

We'll investigate the issue and address in a separate PR.

Great. Thank you!

@bader
Copy link
Contributor Author

bader commented Aug 3, 2023

@sarnex, the version is use in CI is the latest publicly available IGC version. To update the IGC version, we need to wait for a public release.

I don't actually know how often these functions are used by customers, maybe @v-klochkov has more customer insight.

@v-klochkov, @turinevgeny, please, approve if it's okay to have sycl/test-e2e/ESIMD/ext_math.cpp disabled while we are waiting for a new GPU driver.

@bader
Copy link
Contributor Author

bader commented Aug 4, 2023

@intel/dpcpp-tools-reviewers, @intel/llvm-reviewers-runtime, ping.

Copy link
Contributor

@asudarsa asudarsa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @bader

Overall looks ok to turn this on. I do not see any major issues, though it might help to resolve the failures soon as this will block a few folks. If you think it's a good idea, please open an issue for resolving these fails.

Thanks

@bader
Copy link
Contributor Author

bader commented Aug 4, 2023

I do not see any major issues, though it might help to resolve the failures soon as this will block a few folks.

Who might be blocked by this?

If you think it's a good idea, please open an issue for resolving these fails.

I already did - #10682.

@asudarsa
Copy link
Contributor

asudarsa commented Aug 4, 2023

I do not see any major issues, though it might help to resolve the failures soon as this will block a few folks.

Who might be blocked by this?

I think bfloat issue might block OneDNN etc. But it was more of a blanket statement. Not a blocker here.

If you think it's a good idea, please open an issue for resolving these fails.

I already did - #10682.

Just spotted it in the earlier comments. Sorry for missing it. I will try to see if I can help there. Thanks again

@bader
Copy link
Contributor Author

bader commented Aug 7, 2023

@againull, @intel/llvm-reviewers-runtime, ping.

Copy link
Contributor

@againull againull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No concerns from SYCL RT point of view.

@bader bader merged commit 0c51f60 into intel:sycl Aug 8, 2023
@bader bader deleted the test-e2e-opaque branch August 8, 2023 00:37
bader pushed a commit that referenced this pull request Aug 29, 2023
Fixes `sycl/test-e2e/KernelFusion/internalize_array_wrapper.cpp` which
was temporarily deactivated in #9828.

Switching to opaque pointers meant that accesses to the data structure
in the testcase were no longer represented as one "deep" GEP, but rather
a series of GEPs into the different nesting levels of the structure.
Rather than bailing out, we now keep track of GEPs that index into an
aggregate object, and exclude their users from the modulo-`LocalSize`
rewriting (the pointer types still have to be changed to the desired
target address space, though).

In addition, perform `-simplifycfg` before internalization in order to
get rid of unreachable code which, in this particular case, contained
Phi nodes on the candidate accessor.

---------

Signed-off-by: Julian Oppermann <julian.oppermann@codeplay.com>
mdtoguchi pushed a commit to mdtoguchi/llvm that referenced this pull request Oct 18, 2023
Although there are a few tests failing due to this change, we need to go
with this change to avoid future regressions and unblock changes
removing typed pointers support. The regressions are supposed to be
fixed by follow-up patches.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants