In this exercise you will learn how to create a device selector that will choose a device for you to enqueue work to.
When you default construct a queue
the runtime will use the default_selector_v
to choose a device.
Try querying the device
associated with the queue
and information about it.
Remember the device associated with a queue can be retrieved using the
get_device
member function and information about a device can be queried
using the get_info
member function template.
Replace the default selector with one of the other standard device selectors
that are provided by SYCL such as the cpu_selector_v
, gpu_selector_v
and see which device those choose.
Create a device selector using the template below. Implement the function call operator, using various device and platform info queries like the one we used earlier to query the device name and then use that device selector in the queue constructor.
To construct device selector, use the following rule:
The interface for a device selector
is any object that meets the C++ named requirement Callable
,
taking a parameter of type const device &
and returning a value that is implicitly convertible to int
.
int device_selector()(const device &device) { /* scoring logic */ }
Remember the platform associated with a device can be retrieved using the
get_platform
member function.
Remember that the value returned from the device selector's function call operator will represent the score for each device, and a device with a negative score will never be chosen.
For DPC++: Using CMake to configure then build the exercise:
mkdir build
cd build
cmake .. "-GUnix Makefiles" -DSYCL_ACADEMY_USE_DPCPP=ON -DSYCL_ACADEMY_ENABLE_SOLUTIONS=OFF -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx
make exercise_5
Alternatively from a terminal at the command line:
icpx -fsycl -o sycl-ex-5 -I../External/Catch2/single_include ../Code_Exercises/Exercise_05_Device_Selection/source.cpp
./sycl-ex-5
In Intel DevCloud, to run computational applications, you will submit jobs to a queue for execution on compute nodes, especially some features like longer walltime and multi-node computation is only available through the job queue. Please refer to the guide.
So wrap the binary into a script job_submission
and run:
qsub job_submission
For AdaptiveCpp:
# <target specification> is a list of backends and devices to target, for example
# "omp;generic" compiles for CPUs with the OpenMP backend and GPUs using the generic single-pass compiler.
# The simplest target specification is "omp" which compiles for CPUs using the OpenMP backend.
cmake -DSYCL_ACADEMY_USE_ADAPTIVECPP=ON -DSYCL_ACADEMY_INSTALL_ROOT=/insert/path/to/adaptivecpp -DACPP_TARGETS="<target specification>" ..
make exercise_5
alternatively, without CMake:
cd Code_Exercises/Exercise_05_Device_Selection
/path/to/adaptivecpp/bin/acpp -o sycl-ex-5 -I../../External/Catch2/single_include --acpp-targets="<target specification>" source.cpp
./sycl-ex-5