-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#pragma unroll #1425
Comments
A solution for this issue is closely related to other loop decorations (such as FPGA pipelining). I've had this on my wish list for a time but never got around to design something nice. Glad to see that this is now needed elsewhere. |
Loop decorations are Information on algorithms provided by the user that also (implicitly) provide information on data. It would be interesting to understand how alpaka and llama could share this Information. |
I beg to disagree. Example: void f(auto data) {
for (int i = 0; i < 8; i+=2)
access(data, i);
}
void g(auto data) {
#pragma unroll
for (int i = 0; i < 8; i+=2)
access(data, i);
}
void h(auto data) {
access(data, 0);
access(data, 2);
access(data, 4);
access(data, 6);
} Here, |
O.k., this was formulated wrongly by me. A programmer will use these pragmas to convey certain optimization possibilities to the compiler. These are, however, not always possible due to certain - sometimes wrong but usually implicit - assumptions on the data access/properties inside the loop or do not lead to the desired effect, as the compiler is too dumb to see them. With LLAMA one could clarify these assumptions made by the progammer by making them explicit, thus making the intent of a loop decoration clear to both the compiler and other programmers. It would lead to more informed pragmas that clearly extend their initial capabilities. This would thus be an extension to what is proposed here. It boils down to connecting certain intents of the programmer in expressing a parallelization optimization to explicit assumptions on data properties and access that support/enable this optimization. This will be a recurring theme when looking at more complex parallel algorithms. Even when looking at the simple OpenMP pragmas https://www.openmp.org/spec-html/5.1/openmpsu53.html one finds partial unrolling and tiling as options that clearly relate to/are most effective when e.g. using tiled data structures, optimizing throughput, vectorization, controlling register usage or enhancing cache reuse. When looking at FPGAs the possibilities of control on execution and memory access are much broader. Whether this belongs in a portable loop decoration pragma definition one can discuss. Nevertheless, pragmas are an example of the programmer explicitly providing intent on optimization to the compiler that could benefit from knowledge on data properties and access. If this intent is not known to both LLAMA and Alpaka, we might be missing opportunities. |
I am not even sure about that for
Agreed for other kinds of pragmas.
I agree kind of here. We see similar ideas in the parallel STL, where we can annotate loops (inside the STL algorithms) with information on whether they can be executed in multiple threads or element access can be unsequenced within a thread (vectorization). I could imagine a bigger facility here similar to execution policies.
Agreed for some kinds of pragmas. |
Btw, it turns out alpaka already has such a portable macro in |
Can this issue be closed now that you found the solution or is there something missing? |
Sure. I wanted to wait until #1437 is in, then I close that topic for me. |
Alpaka should support a portable way to request unrolling of loops, which is typicall done via the
#pragma unroll
directive.This request came up during the discussion of an optimization proposed for PIConGPU: ComputationalRadiationPhysics/picongpu#3859
The text was updated successfully, but these errors were encountered: