Provide a user-friendly interface using C++ Standard Library paradigms and concepts.
- ease first steps for users without experience with SYCL
- provide standard facilities for common tasks
- interop with C++ Standard Library types especially with containers std::vector and std::array - in progress
- standard implementations of common algorithms and selected C++ Standard Library algorithms - in progress
- extensive support for multi-dimensional buffers including multi-dimensional versions of selected standard algorithms - partially done
- a C++20 ranges-like interface including kernel composition for exposing task sequences to the runtime
- CMake integration
- a Conan package
- a vcpkg package
- C++20 modules
- impose zero runtime overhead (ideally)
- use an iterator-based algorithms interface (preferably using sentinels for end iterators, requires C++17 for support in range-based for loops)
- work with all major compilers
- clang
- gcc
- msvc
- icc
-
instead global functions in combination with ADLdevice_vector
for wrapping celerity buffers avoid intrusive changes of the public interface of the celerity core. -
iterators forIterators do only specify the range of iteration but not which elements are accessible (celerity-wise). Range-access is controlled by accepting different accessor types in the callback function (see Multi-dimensional Buffer Support).device_vector
for providing a std like algorithm interface -
copy
,copy_if
,copy_n
,transform
for copying data from/to STD containers -
STD-like constructors for celerity buffers (using ranges or iterator-pairs)would affect the public interface of celerity buffers. Maybe a dedicated buffer type will be implemented to support this (as originally planned)
-
begin(buffer)
,end(buffer)
to enable range-based for loops on master - only inside ofon_master(...)
- use execution policies akin to STD execution policies to decide where to run the algorithm (distributed or master-only) - range adaptors/actions require distributed execution
-
copy
- rudimentary -
copy_if
-
copy_n
-
count
-
count_if
-
for_each
- master-only as device kernels can (typically) not have side-effects -
for_each_n
-
transform
- no support for STD containers, no in-place transformation -
fill
-
fill_n
- only available as building block with unspecified output iterator -
generate
-
generate_n
- only available as building block with unspecified output iterator -
min
,max
,minmax
-
iota
-
reduce
-
inner_product
-
adjacent_difference
-
partial_sum
-
exclusive_scan
-
inclusive_scan
- pool - akin to the convolution neural network layer, reduce input range to smaller range using pooling operation (e.g. max-pooling)
- multi-dimensional
single-element accessorone_to_one_iterator
- multi-dimensional
neighbour_iterator
chunk<>
accessor -
multi-dimensionalclamping_neighbour_iterator
chunk<>
detects whether is lies on the borders and computation may branch accordingly - multi-dimensional
slice_iterator
slice<>
accessor - multi-dimensional
n_dim_iterator
for STD containers
-
copy
- rudimentary -
copy_if
-
copy_n
-
count
-
count_if
-
for_each
- master-only as device kernels can (typically) not have side-effects -
for_each_n
-
transform
-
fill
-
fill_n
- only available as building block with unspecified output iterator -
generate
-
generate_n
- only available as building block with unspecified output iterator -
min
-
max
-
minmax
-
iota
? -
reduce
? -
inner_product
? -
adjacent_difference
? -
partial_sum
? -
exclusive_scan
? -
inclusive_scan
?
- C++20 ranges for expressing (sub-) regions
- Range adaptors/actions for composing task graph
- adaptor/action for custom kernels using the traditional celerity programming model
- explore possibility to fuse compatible kernels
- fusion of
- kernels with one input and one output with single element access (i.e. not
chunk<>
,slice<>
orall<>
) - kernels with two inputs and one output with single element access (i.e. not
chunk<>
,slice<>
orall<>
) -
fill
kernels and singlechunk<>
access kernels by serializing filling of chunk - non
fill
kernels and 'chunk<>' access kernels by serializing kernel (requires cl::sycl::detail::make_item)
- kernels with one input and one output with single element access (i.e. not
- fusion of
-
will be reformulated soonContiguousIterator
concept