Description
Hi, i'm playing around SYCL and i'm facing a really strange performance behaviour on a little script.
Let's start form this little script:
#include <CL/sycl.hpp>
#include <iostream>
#include <math.h>
#include <chrono>
#define IMAGE_WIDTH (20000L)
#define IMAGE_HEIGHT (40000L)
#define IMAGE_SIZE (IMAGE_WIDTH*IMAGE_HEIGHT)
unsigned char* old_image;
namespace sycl = cl::sycl;
int main(int argc, char *argv[]) {
old_image = new unsigned char[IMAGE_SIZE];
{
sycl::queue myQueue(sycl::gpu_selector{});
sycl::buffer<unsigned char, 1> inputBuf(old_image, sycl::range<1>(IMAGE_WIDTH*IMAGE_HEIGHT));
myQueue.submit([&](sycl::handler& cgh) {
auto readImage = inputBuf.get_access<sycl::access::mode::read>(cgh);
cgh.parallel_for<class simple_test>(sycl::range<1>(1821303172), [=](sycl::id<1> idx) {
});
});
}
return 0;
}
This script runs in
sysele@sysele-C08:~/work/sycl/rotate$ time ./rotate.gpu
real 0m0.978s
user 0m0.172s
sys 0m0.084s
If i change the iterations of the parallel_for, decreasing them, putting 1761303172 instead of 1821303172, the timings change in these:
sysele@sysele-C08:~/work/sycl/rotate$ time ./rotate.gpu
real 0m4.541s
user 0m0.190s
sys 0m0.065s
Any explanations? The timings are deterministic.
I'm using a Intel(R) Core(TM) i7-7820EQ CPU @ 3.00GHz with the latest SYCL GIT.
I've tried oclcpuexp-2019.9.11.0.1106_rel.tar.gz and oclcpuexp-2019.8.8.0.0822_rel.tar.gz both and all these assets have the issue.