Skip to content

[SYCL] Add DPC++ RT support for SYCL 2020 spec constants (part 1) #3382

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Apr 1, 2021
Merged

[SYCL] Add DPC++ RT support for SYCL 2020 spec constants (part 1) #3382

merged 15 commits into from
Apr 1, 2021

Conversation

dm-vodopyanov
Copy link
Contributor

This patch adds partial implementation of specialization constants in DPC++ RT:

  1. Implementation of specialization_id class
  2. Implementation of kernel_handler class
  3. Support for user's device lambdas which take kernel_handler argument

This patch adds partial implementation of specialization constants
in DPC++ RT:

1. Implementation of `specialization_id` class
2. Implementation of `kernel_handler` class
3. Support for user's device lambdas which take `kernel_handler` argument
@dm-vodopyanov dm-vodopyanov requested a review from a team as a code owner March 19, 2021 18:05
@dm-vodopyanov
Copy link
Contributor Author

This patch depends on FE part #3345 and tools part #3353

T getSpecializationConstantOnDevice() {
const char *SymbolicID = __builtin_unique_stable_name(
detail::specialization_id_name_generator<S>);
return __sycl_getScalar2020SpecConstantValue<T>(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

T getSpecializationConstantOnDevice() {
const char *SymbolicID = __builtin_unique_stable_name(
detail::specialization_id_name_generator<S>);
return __sycl_getComposite2020SpecConstantValue<T>(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


// TODO: replace detail::RangeRoundedLambda with
// "constexpr if" when DPC++ RT switched to C++17
detail::RangeRoundedLambda<TransformedArgType, KernelType, Dims> Wrapper{
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rdeodhar, please review from range rounding point-of-view

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function containing the range rounding code, "void parallel_for_lambda_impl(range NumWorkItems, KernelType KernelFunc) {...}" is called with a KernelFunc.

KernelFunc may have a kernel_handler parameter. If it does, the wrapped kernel should be declared with that additional parameter, (say, KH), and the body of the wrapper should pass it to the original kernel, as KernelFunc(Arg, KH);

I'm not sure how you'll do it.

@dm-vodopyanov dm-vodopyanov requested a review from rdeodhar March 23, 2021 23:55
@dm-vodopyanov
Copy link
Contributor Author

Ran post-commit locally on Linux with #3345 and #3353 patches: no failures during build and check-sycl:

[36/39] Running device-agnostic SYCL regression tests for SPIR-V
-- Testing: 208 tests, 24 workers --
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90.. 

Testing Time: 85.77s
  Unsupported:   2
  Passed     : 206
[37/39] Running the SYCL regression tests for OpenCL
-- Testing: 9 tests, 9 workers --
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90.. 

Testing Time: 9.85s
  Unsupported: 4
  Passed     : 5
[38/39] Running the SYCL regression tests for Level Zero
-- Testing: 9 tests, 9 workers --
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90.. 

Testing Time: 9.24s
  Unsupported: 5
  Passed     : 4

@dm-vodopyanov
Copy link
Contributor Author

Ran post-commit locally on Windows with #3345 and #3353 patches: no failures during build, and one unrelated to this patch failure in SYCL-on-device :: back_to_back_collectives.cpp (machine environment issue):

[57/60] Running device-agnostic SYCL regression tests for SPIR-V
-- Testing: 208 tests, 12 workers --
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90.. 

Testing Time: 118.07s
  Unsupported:  20
  Passed     : 188
[58/60] Running the SYCL regression tests for Level Zero
-- Testing: 9 tests, 9 workers --
Testing:  0.. 10.. 20.. 30
FAIL: SYCL-on-device :: back_to_back_collectives.cpp (4 of 9)
******************** TEST 'SYCL-on-device :: back_to_back_collectives.cpp' FAILED ********************
Script:
--
: 'RUN: at line 1';   d:\dvodopya\dv-llvm\build\bin\clang.exe --driver-mode=g++ -fsycl -fsycl-targets=spir64-unknown-unknown-sycldevice D:\dvodopya\dv-llvm\sycl\test\on-device\back_to_back_collectives.cpp -o D:\dvodopya\dv-llvm\build\tools\sycl\test\Output\back_to_back_collectives.cpp.tmp.out
: 'RUN: at line 2';   env SYCL_DEVICE_FILTER=host  D:\dvodopya\dv-llvm\build\tools\sycl\test\Output\back_to_back_collectives.cpp.tmp.out
: 'RUN: at line 3';   true D:\dvodopya\dv-llvm\build\tools\sycl\test\Output\back_to_back_collectives.cpp.tmp.out
: 'RUN: at line 4';    env SYCL_DEVICE_FILTER=level_zero:gpu  D:\dvodopya\dv-llvm\build\tools\sycl\test\Output\back_to_back_collectives.cpp.tmp.out
: 'RUN: at line 5';   true D:\dvodopya\dv-llvm\build\tools\sycl\test\Output\back_to_back_collectives.cpp.tmp.out
--
Exit Code: 127

Command Output (stdout):
--
$ ":" "RUN: at line 1"
$ "d:\dvodopya\dv-llvm\build\bin\clang.exe" "--driver-mode=g++" "-fsycl" "-fsycl-targets=spir64-unknown-unknown-sycldevice" "D:\dvodopya\dv-llvm\sycl\test\on-device\back_to_back_collectives.cpp" "-o" "D:\dvodopya\dv-llvm\build\tools\sycl\test\Output\back_to_back_collectives.cpp.tmp.out"
$ ":" "RUN: at line 2"
$ "env" "SYCL_DEVICE_FILTER=host" "D:\dvodopya\dv-llvm\build\tools\sycl\test\Output\back_to_back_collectives.cpp.tmp.out"
# command output:
Skipping test


$ ":" "RUN: at line 3"
$ "true" "D:\dvodopya\dv-llvm\build\tools\sycl\test\Output\back_to_back_collectives.cpp.tmp.out"
# command stderr:
'true': command not found
error: command failed with exit status: 127

--

********************
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90.. 
********************
Failed Tests (1):
  SYCL-on-device :: back_to_back_collectives.cpp


Testing Time: 13.45s
  Unsupported: 1
  Passed     : 7
  Failed     : 1

@dm-vodopyanov
Copy link
Contributor Author

dm-vodopyanov commented Mar 24, 2021

Ping @ everyone

};

template <typename F, typename RetT, typename... Args>
struct check_fn_signature<F, RetT(Args...)> {
Copy link
Contributor Author

@dm-vodopyanov dm-vodopyanov Mar 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved from handler.hpp as is, except check_kernel_lambda_takes_args - this is a new type trait

}
MKernel(ID);
});
detail::NDLoop<Dims>::iterate(
Copy link
Contributor Author

@dm-vodopyanov dm-vodopyanov Mar 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-format, the only change - runKernelWithArg... line


#ifdef __SYCL_DEVICE_ONLY__
template <auto &S, typename T = std::remove_reference_t<decltype(S)>,
std::enable_if_t<std::is_scalar_v<T>>>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to use is_fundamental here?
@AlexeySachkov

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made is_fundamental. @AlexeySachkov, can you please review if is_fundamental and is_compaund are correct type traits here?

// additional kernel_handler argument.

// NOTE: to support kernel_handler argument in kernel lambdas, only
// kernel_***_wrapper functions must be called in this code
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This condition - only kernel_***_wrapper functions must be called in this code - does not hold. kernel_handler constructor is also called. Can it be a problem if the constructor becomes non-trivial?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This condition - only kernel_***_wrapper functions must be called in this code - does not hold

Ca you please elaborate on this a bit? What I meant by this is that we shouldn't call, e.g., kernel_parallel_for in our headers anywhere because we need some logic above it to decide what version of kernel_parallel_for to run - with or w/o kernel_handler support. And we need some place to initialize kernel_handler obj on our side. In these cases kernel_parallel_for_wrapper acts as this place.

Can it be a problem if the constructor becomes non-trivial?

It can if something will be changed in spec, in this case initialization of kernel_handler object should be re-written here. Right now we have separate init function __init_specialization_constants_buffer which is called in FE.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I see. This is not quite clear from the comments - I'd suggest to reword for better clarity.
My guess was that this was related more to parallel_for_work_group, where variables local to a kernel become WG-shared (address space 3 global in LLVM IR), and there is code filtering for the WG leader work item.

// This guard is needed because the libsycl.so can compiled with C++ <=14
// while the code requires C++17. This code is not supposed to be used by the
// libsycl.so so it should not be a problem.
#if __cplusplus > 201402L
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: spec const 2020 design is not settled yet, so this code may need to be changed/removed in future. + @AlexeySachkov, @gmlueck for possible comments.

@dm-vodopyanov dm-vodopyanov requested a review from kbobrovs March 26, 2021 12:05
romanovvlad
romanovvlad previously approved these changes Mar 26, 2021
@dm-vodopyanov
Copy link
Contributor Author

Jenkins/Precommit failure: tests compile with c++14, removed the flag and created PR: intel/llvm-test-suite#202

kbobrovs
kbobrovs previously approved these changes Mar 27, 2021
romanovvlad
romanovvlad previously approved these changes Mar 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants