Skip to content

{SYCL][PI][L0] - Eliminate std::string construction/destruction overh… #3931

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 16, 2021

Conversation

kbsmith-intel
Copy link
Contributor

…ead.

This change eliminates std::string construction/destruction overhead
for calls to ZeCall::doCall, and similar entry points. The changes have
no observable functional effect but reduce the overall size of the
built library by 8-9%, and eliminate the memory allocation and
std::string construction ond destruction overhead, thus lowering
overall execution time on every level zero API call that uses the
macros ZE_CALL and ZE_CALL_NOCHECK.

…ead.

This change eliminates std::string construction/destruction overhead
for calls to ZeCall::doCall, and similar entry points. The changes have
no observable functional effect but reduce the overall size of the
built library by 8-9%, and eliminate the memory allocation and
std::string construction ond destruction overhead, thus lowering
overall execution time on every level zero API call that uses the
macros ZE_CALL and ZE_CALL_NOCHECK.
@keryell
Copy link
Contributor

keryell commented Jun 15, 2021

That looks like a good motivation.
But I wonder whether this is not a good use case for some std::string_view...

@AGindinson
Copy link
Contributor

That looks like a good motivation.
But I wonder whether this is not a good use case for some std::string_view...

AFAIU, C++17 features are not available in the current library/runtime build.

@kbsmith-intel
Copy link
Contributor Author

That looks like a good motivation.
But I wonder whether this is not a good use case for some std::string_view...

AFAIU, C++17 features are not available in the current library/runtime build.

I did a quick check where I used std::string_view instead of the const char * for the params of doCall. This is almost as good with respect to the overhead as the const char *, but not quite as good. In particular, for every usage of the macro ZE_CALL or ZE_CALL_NOCHECK, with string_view, there is extra overhead for construction of the string_view. With const char *, it just loads the address of the string into a register (for each of the two string parameters), while for string_view ir loads both the address of the string, and the strings length. So, this causes a bit more overhead, and this really is unnecessary since all uses in doCall really just use the string itself.

So, my preference is to keep this implementation using const char *.

@keryell
Copy link
Contributor

keryell commented Jun 16, 2021

I did a quick check where I used std::string_view instead of the const char * for the params of doCall. This is almost as good with respect to the overhead as the const char *, but not quite as good. In particular, for every usage of the macro ZE_CALL or ZE_CALL_NOCHECK, with string_view, there is extra overhead for construction of the string_view. With const char *, it just loads the address of the string into a register (for each of the two string parameters), while for string_view ir loads both the address of the string, and the strings length. So, this causes a bit more overhead, and this really is unnecessary since all uses in doCall really just use the string itself.

So, my preference is to keep this implementation using const char *.

Thanks for the feedback after experimenting!
If at the end the L0 API takes a const char * anyway, there is probably no benefit to try to avoid counting the characters in the L0 API by providing the length upfront...

@pvchupin pvchupin merged commit ae89341 into intel:sycl Jun 16, 2021
alexbatashev pushed a commit to alexbatashev/llvm that referenced this pull request Jun 18, 2021
* upstream/sycl: (776 commits)
  Align CMake requirements with upstream. (intel#3928)
  [SYCL] Deprecate [[intel::reqd_work_group_size]] attribute spelling (intel#3927)
  [SYCL] Fix post commit after PR 2292 (intel#3939)
  {SYCL][PI][L0] - Eliminate std::string construction/destruction overhead. (intel#3931)
  [ESIMD] Overloading sycl sin,cos,exp,log functions for ESIMD arguments (intel#3717)
  [sycl-post-link] Add device image property for assert feature (intel#3881)
  [SYCL] Split read/write lockings (intel#2292)
  Handle OpSpecConstantOp with Select
  Handle OpSpecConstantOp with SMod
  Add tests for SConvert, UConvert, BitCast OpSpecConstantOp
  Fix attachment of decoration to spec constants
  Implement support for dynamic memmove
  Align clang-tidy/format versions to LLVM version
  Handle OpSpecConstantOp for integer comparisons
  Handle OpSpecConstantOp for SNegate, Not, and LogicalNot
  Extend OpSpecConstantOp testing for initializers
  Use IRBuilder for folding
  Fix translating of compile unit
  Support buffers in LinalgFoldUnitExtentDims
  [gn build] Port d0a5d86
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants