Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve DPC++ compilation times #243

Merged
merged 7 commits into from
Jan 24, 2022
Merged

Conversation

psalz
Copy link
Member

@psalz psalz commented Dec 27, 2021

Moving discussions from #234 over to here. While DPC++ can now compile all but 5 test categories, we haven't removed them from the CI filter thus far because compilation times become rather long (over 2 hours). While we can probably still improve things somewhat on the CTS side (by reducing the number of template instantiations in non-extensive mode), there also appears to be some optimization potential on DPC++'s side. I've updated the CI container image to a newer version of DPC++ (ec97c57) that includes two recent performance improvements (intel/llvm#5127, intel/llvm#5178) and have also set the __SYCL_DISABLE_PARALLEL_FOR_RANGE_ROUNDING__ macro, as suggested in #234.

Let's use this PR to keep track of improvements over time, until we hopefully get compilation times into an acceptable range (which I would say is around 1 hour).

cc @bader @alexbatashev

psalz and others added 3 commits December 27, 2021 10:44
This includes two changes that should improve CTS compilation times
(intel/llvm#5127, intel/llvm#5178).
Co-authored-by: Alexey Bader <alexey.bader@intel.com>
@alexbatashev
Copy link
Contributor

Off the top of my head, here's where there're currently hotspots:

  1. I can still see some overhead introduced by ConstructOpenCLKernel, although I'm not sure if we can save much on it.
  2. Just using #include <sycl/sycl.hpp> in source code is ridiculously slow, around 7 seconds on my system. Also not sure how much can be won here.
  3. types.hpp is a known source of problems with compile times. sycl::vec is very template-heavy. Massaging some diagnostics gives me around 4% improvement, which is not much.
  4. Disabling asserts in nightly builds improves compile times by 10%. Not much, but I think we should consider it.
  5. Clang itself is not very performant, when it comes to templates.

Even combined, all these trick are unlikely to make a huge difference. Maybe some caching is possible here? We don't have to uplift compilers every day. And most tests don't change this often.

an acceptable range (which I would say is around 1 hour)

I would argue. 10 minutes is acceptable. 1 hour is half of the time it takes to build the compiler toolchain.

cmake/FindDPCPP.cmake Outdated Show resolved Hide resolved
@psalz
Copy link
Member Author

psalz commented Dec 27, 2021

I would argue. 10 minutes is acceptable. 1 hour is half of the time it takes to build the compiler toolchain.

In an ideal world... But you're probably right, we should aim higher.

Even combined, all these trick are unlikely to make a huge difference. Maybe some caching is possible here? We don't have to uplift compilers every day. And most tests don't change this often.

That is a good point. Maybe this is all moot if we simply add ccache... I will look into it!

@psalz psalz marked this pull request as ready for review January 10, 2022 14:15
@psalz
Copy link
Member Author

psalz commented Jan 10, 2022

Updated to include recent changes. Unfortunately this won't profit from ccache, as the new compiler version invalidates the cache. Since that workflow now also builds the container image itself, I expect a very long run ;-).

@alexbatashev
Copy link
Contributor

Updated to include recent changes. Unfortunately this won't profit from ccache, as the new compiler version invalidates the cache. Since that workflow now also builds the container image itself, I expect a very long run ;-).

I guess, you don’t need to build the compiler too often. You can keep a fixed container tag for a couple of weeks and have some automation to uplift it and warm-up cache.

@psalz
Copy link
Member Author

psalz commented Jan 11, 2022

Hmm.. It seems like the newer revision of DPC++ again broke ccache. This is not what I reported in intel/llvm#5260, since I applied the workaround in #245 and as you can see here (in step "Post Set up ccache") 26 MB of cache are being uploaded. Now however with DPC++ @ ec97c57 the cache size is reported as 0 MB.

@psalz
Copy link
Member Author

psalz commented Jan 24, 2022

Hmm.. It seems like the newer revision of DPC++ again broke ccache.

Nevermind, turns out I introduced a bug during the previous back-merge. Everything seems to work now, so I'll go ahead and merge!

@psalz psalz merged commit 9c4c2a6 into SYCL-2020 Jan 24, 2022
@psalz psalz deleted the improve-dpcpp-compilation-times branch January 24, 2022 15:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants