Skip to content

[SYCL] Change kernel/program cache data structures and get rid of boost #18599

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 5, 2025

Conversation

sarnex
Copy link
Contributor

@sarnex sarnex commented May 21, 2025

I'm working to ease Linux distros in packaging this repo, and our use of boost was hindering that.
The only place we use boost today is the data structures for the kernel and program cache.

Change it to be an equally fast data structure that was already use for xptifw (albeit with a manually implemented wrapper for unordered_multimap that the upstream repo doesn't provide) and delete boost.

I've done various benchmarks to make sure there is no performance impact using the cache benchmark test sent to me by Slawomir.

Here the the performance results. Each entry is an average of 10 runs, each run using 3000 kernels with 10000 iterations. Let me know if you'd like the full data. All times are in milliseconds and lower is better.

Linux:

Fast kernel cache:

Boost This PR Percent Diff
44.95478 45.04912 0.21%

Regular kernel cache/program cache:

Boost This PR Percent Diff
21541.69 21150.15 -1.83%

Windows:

Fast kernel cache:

Boost This PR Percent Diff
71.19915 69.38018 -2.59%

Regular kernel cache/program cache:

Boost This PR Percent Diff
7295.955 7298.057 0.03%

My interpretation of the results is that there is basically no performance impact, either positive or negative.

Different test kinds may have been run on different hardware, though of course tests of the same os/test kind were done on the same machine.

And just for fun:

configure.py time:

Boost This PR Percent Diff
29.812 21.1625 -33.94%

@sarnex sarnex temporarily deployed to WindowsCILock May 21, 2025 22:36 — with GitHub Actions Inactive
@sarnex sarnex temporarily deployed to WindowsCILock May 21, 2025 23:01 — with GitHub Actions Inactive
@sarnex sarnex temporarily deployed to WindowsCILock May 21, 2025 23:01 — with GitHub Actions Inactive
@sarnex sarnex temporarily deployed to WindowsCILock May 21, 2025 23:08 — with GitHub Actions Inactive
@sarnex sarnex temporarily deployed to WindowsCILock May 21, 2025 23:36 — with GitHub Actions Inactive
@sarnex sarnex temporarily deployed to WindowsCILock May 21, 2025 23:36 — with GitHub Actions Inactive
Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
@sarnex sarnex temporarily deployed to WindowsCILock May 22, 2025 14:48 — with GitHub Actions Inactive
@sarnex sarnex temporarily deployed to WindowsCILock May 22, 2025 15:13 — with GitHub Actions Inactive
@sarnex sarnex temporarily deployed to WindowsCILock May 22, 2025 15:13 — with GitHub Actions Inactive
@sarnex sarnex marked this pull request as ready for review May 22, 2025 19:07
@sarnex sarnex requested review from a team as code owners May 22, 2025 19:07
@sarnex sarnex requested a review from slawekptak May 22, 2025 19:08
Copy link
Contributor

@kbenzie kbenzie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UR LGTM and 👍 for getting rid of boost

@sarnex sarnex requested a review from steffenlarsen May 23, 2025 17:32
@steffenlarsen
Copy link
Contributor

I hope to bring #16763 back ASAP. Do you know if one of the LLVM structures could offer similar performance?

@sarnex
Copy link
Contributor Author

sarnex commented May 27, 2025

I'm not sure, but I expect the LLVM ones have similar performance to the ones in this PR. This PR isn't introducing any new dependency, so if you expect linking against LLVMSupport again will take a bit I'd prefer to merge this PR and then switch to the LLVM data structures in a future PR. If you expect it will be done quickly, I can rework this PR after it's merged. Thanks!

@sarnex
Copy link
Contributor Author

sarnex commented May 28, 2025

@steffenlarsen Sorry forgot to tag you in my response, see above comment :)

@steffenlarsen
Copy link
Contributor

I'm not sure, but I expect the LLVM ones have similar performance to the ones in this PR. This PR isn't introducing any new dependency, so if you expect linking against LLVMSupport again will take a bit I'd prefer to merge this PR and then switch to the LLVM data structures in a future PR. If you expect it will be done quickly, I can rework this PR after it's merged. Thanks!

My bad for not realizing it was for me!

AFAIK, this adding a dependency is only partially right. Currently it is only XPTI using it (correct me if I am wrong) so someone compiling without XPTI don't have emhash as a dependency. That said, it seems like this PR trades a dependency for another, so I don't think we're any worse off. I am okay with proceeding but would really prefer that we check if LLVMSupport variants can be used instead to avoid external dependencies altogether. 🤔

@sarnex
Copy link
Contributor Author

sarnex commented May 28, 2025

@steffenlarsen My bad! You're right, it's only a dependency if XPTI is enabled, I was looking at it from the POV as the project as whole, but you're right. Luckily this dependency is only a few header files instead of boost which is huge.

If you could ping me once LLVMSupport is available in libsycl I can retest with those data structures and use them if everything looks good.

@sarnex
Copy link
Contributor Author

sarnex commented May 28, 2025

Hi @sergey-semenov, I'd like your review on this before merging given you worked in this code recently, thanks!

@@ -0,0 +1,142 @@
//==------------ unordered_multimap.hpp ----------------------------==//
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering about why don't we instead use std::unordered_multimap? For performance? https://en.cppreference.com/w/cpp/container/unordered_multimap.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I heard the performance of all the std data structures was significantly worse than boost, so I wanted to use something equivalent. To be honest I didn't actually test std vs boost vs this pr through.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I'd be curious about how bad do std::unordered_multimap perform w.r.t to boost. I can help with benchmarking std DS.

Copy link
Contributor

@aelovikov-intel aelovikov-intel May 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

boost was better enough for us not to switch back to std when we were removing another boost deps. Or rather, it was better enough when introduced here. I think Sergey had those old numbers and we decided to keep this as customization downstream even if upstream would be free of boost.

Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
@sarnex sarnex temporarily deployed to WindowsCILock June 2, 2025 19:27 — with GitHub Actions Inactive
@sarnex sarnex temporarily deployed to WindowsCILock June 2, 2025 19:54 — with GitHub Actions Inactive
@sarnex sarnex temporarily deployed to WindowsCILock June 2, 2025 19:54 — with GitHub Actions Inactive
@sarnex
Copy link
Contributor Author

sarnex commented Jun 3, 2025

@sergey-semenov Ping on this one, thanks!

@sarnex sarnex merged commit 8c528be into intel:sycl Jun 5, 2025
24 checks passed
FetchContent_Populate(emhash-headers)
# The sycl target should download or find emhash.
if (NOT SYCL_EMHASH_DIR)
message(FATAL_ERROR "emhash headers not found")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has broken UR standalone builds.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll fix it, can you tell me how to reproduce it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the root of intel/llvm run:

$ cmake unified-runtime -Bbuild-ur-r -GNinja -DCMAKE_BUILD_TYPE=Release \
    -DUR_FORMAT_CPP_STYLE=ON -DUR_DEVELOPER_MODE=ON -DUR_ENABLE_TRACING=ON \
    -DUR_DPCXX=/path/to/intel/llvm/build/bin/clang++ \
    -DUR_BUILD_ADAPTER_L0=ON -DUR_BUILD_ADAPTER_OPENCL=ON

It will fail to configure with:

CMake Error at /home/benie/Projects/intel/llvm/xptifw/src/CMakeLists.txt:6 (message):
  emhash headers not found

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, if I can't fix it within 30 min I'll revert.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix here, just doing some final testing locally.

sarnex added a commit that referenced this pull request Jun 5, 2025
Fixes issue reported
[here](#18599 (comment)).

Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
sergey-semenov added a commit that referenced this pull request Jun 24, 2025
sarnex added a commit to sarnex/llvm that referenced this pull request Jun 25, 2025
Fixes issue reported
[here](intel#18599 (comment)).

Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants