-
Notifications
You must be signed in to change notification settings - Fork 790
[SYCL] Change kernel/program cache data structures and get rid of boost #18599
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UR LGTM and 👍 for getting rid of boost
I hope to bring #16763 back ASAP. Do you know if one of the LLVM structures could offer similar performance? |
I'm not sure, but I expect the LLVM ones have similar performance to the ones in this PR. This PR isn't introducing any new dependency, so if you expect linking against LLVMSupport again will take a bit I'd prefer to merge this PR and then switch to the LLVM data structures in a future PR. If you expect it will be done quickly, I can rework this PR after it's merged. Thanks! |
@steffenlarsen Sorry forgot to tag you in my response, see above comment :) |
My bad for not realizing it was for me! AFAIK, this adding a dependency is only partially right. Currently it is only XPTI using it (correct me if I am wrong) so someone compiling without XPTI don't have emhash as a dependency. That said, it seems like this PR trades a dependency for another, so I don't think we're any worse off. I am okay with proceeding but would really prefer that we check if LLVMSupport variants can be used instead to avoid external dependencies altogether. 🤔 |
@steffenlarsen My bad! You're right, it's only a dependency if XPTI is enabled, I was looking at it from the POV as the project as whole, but you're right. Luckily this dependency is only a few header files instead of boost which is huge. If you could ping me once LLVMSupport is available in libsycl I can retest with those data structures and use them if everything looks good. |
Hi @sergey-semenov, I'd like your review on this before merging given you worked in this code recently, thanks! |
@@ -0,0 +1,142 @@ | |||
//==------------ unordered_multimap.hpp ----------------------------==// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wondering about why don't we instead use std::unordered_multimap
? For performance? https://en.cppreference.com/w/cpp/container/unordered_multimap.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I heard the performance of all the std data structures was significantly worse than boost, so I wanted to use something equivalent. To be honest I didn't actually test std vs boost vs this pr through.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I'd be curious about how bad do std::unordered_multimap perform w.r.t to boost. I can help with benchmarking std DS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
boost was better enough for us not to switch back to std when we were removing another boost deps. Or rather, it was better enough when introduced here. I think Sergey had those old numbers and we decided to keep this as customization downstream even if upstream would be free of boost.
Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
@sergey-semenov Ping on this one, thanks! |
FetchContent_Populate(emhash-headers) | ||
# The sycl target should download or find emhash. | ||
if (NOT SYCL_EMHASH_DIR) | ||
message(FATAL_ERROR "emhash headers not found") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has broken UR standalone builds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll fix it, can you tell me how to reproduce it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the root of intel/llvm run:
$ cmake unified-runtime -Bbuild-ur-r -GNinja -DCMAKE_BUILD_TYPE=Release \
-DUR_FORMAT_CPP_STYLE=ON -DUR_DEVELOPER_MODE=ON -DUR_ENABLE_TRACING=ON \
-DUR_DPCXX=/path/to/intel/llvm/build/bin/clang++ \
-DUR_BUILD_ADAPTER_L0=ON -DUR_BUILD_ADAPTER_OPENCL=ON
It will fail to configure with:
CMake Error at /home/benie/Projects/intel/llvm/xptifw/src/CMakeLists.txt:6 (message):
emhash headers not found
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, if I can't fix it within 30 min I'll revert.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix here, just doing some final testing locally.
Fixes issue reported [here](#18599 (comment)). Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
Boost header usage was removed in #18599
Fixes issue reported [here](intel#18599 (comment)). Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
I'm working to ease Linux distros in packaging this repo, and our use of boost was hindering that.
The only place we use boost today is the data structures for the kernel and program cache.
Change it to be an equally fast data structure that was already use for
xptifw
(albeit with a manually implemented wrapper forunordered_multimap
that the upstream repo doesn't provide) and delete boost.I've done various benchmarks to make sure there is no performance impact using the cache benchmark test sent to me by Slawomir.
Here the the performance results. Each entry is an average of 10 runs, each run using 3000 kernels with 10000 iterations. Let me know if you'd like the full data. All times are in milliseconds and lower is better.
Linux:
Fast kernel cache:
Regular kernel cache/program cache:
Windows:
Fast kernel cache:
Regular kernel cache/program cache:
My interpretation of the results is that there is basically no performance impact, either positive or negative.
Different test kinds may have been run on different hardware, though of course tests of the same os/test kind were done on the same machine.
And just for fun:
configure.py
time: