#pragma unroll #1425

bernhardmgruber · 2021-10-19T13:28:42Z

Alpaka should support a portable way to request unrolling of loops, which is typicall done via the #pragma unroll directive.

This request came up during the discussion of an optimization proposed for PIConGPU: ComputationalRadiationPhysics/picongpu#3859

The text was updated successfully, but these errors were encountered:

j-stephan · 2021-10-21T18:26:45Z

A solution for this issue is closely related to other loop decorations (such as FPGA pipelining). I've had this on my wish list for a time but never got around to design something nice. Glad to see that this is now needed elsewhere.

bussmann · 2021-10-22T03:32:44Z

Loop decorations are Information on algorithms provided by the user that also (implicitly) provide information on data. It would be interesting to understand how alpaka and llama could share this Information.

bernhardmgruber · 2021-10-22T09:51:42Z

Loop decorations are Information on algorithms provided by the user that also (implicitly) provide information on data. It would be interesting to understand how alpaka and llama could share this Information.

I beg to disagree. #pragma unroll is purly about how the compiler produces an instruction stream from the loop. It does not contain any information on how data is touched. And in the case of ComputationalRadiationPhysics/picongpu#3859 it pokes the optimizer in the right direction.

Example:

void f(auto data) {
    for (int i = 0; i < 8; i+=2)
        access(data, i);
}
void g(auto data) {
    #pragma unroll
    for (int i = 0; i < 8; i+=2)
        access(data, i);
}
void h(auto data) {
    access(data, 0);
    access(data, 2);
    access(data, 4);
    access(data, 6);
}

Here, f, g and h have completely equivalent semantic. Still, g and h might expose more instruction level parallelism than f. However, they might also blow up the executable size. All functions might also produce the same assembly. #pragma unroll is just an optimization hint. I don't see how this information is relevant for alpaka or llama.

bussmann · 2021-10-24T04:37:11Z

O.k., this was formulated wrongly by me. A programmer will use these pragmas to convey certain optimization possibilities to the compiler.

These are, however, not always possible due to certain - sometimes wrong but usually implicit - assumptions on the data access/properties inside the loop or do not lead to the desired effect, as the compiler is too dumb to see them.

With LLAMA one could clarify these assumptions made by the progammer by making them explicit, thus making the intent of a loop decoration clear to both the compiler and other programmers. It would lead to more informed pragmas that clearly extend their initial capabilities.

This would thus be an extension to what is proposed here. It boils down to connecting certain intents of the programmer in expressing a parallelization optimization to explicit assumptions on data properties and access that support/enable this optimization. This will be a recurring theme when looking at more complex parallel algorithms.

Even when looking at the simple OpenMP pragmas https://www.openmp.org/spec-html/5.1/openmpsu53.html one finds partial unrolling and tiling as options that clearly relate to/are most effective when e.g. using tiled data structures, optimizing throughput, vectorization, controlling register usage or enhancing cache reuse. When looking at FPGAs the possibilities of control on execution and memory access are much broader.

Whether this belongs in a portable loop decoration pragma definition one can discuss. Nevertheless, pragmas are an example of the programmer explicitly providing intent on optimization to the compiler that could benefit from knowledge on data properties and access. If this intent is not known to both LLAMA and Alpaka, we might be missing opportunities.

bernhardmgruber · 2021-10-29T10:15:18Z

A programmer will use these pragmas to convey certain optimization possibilities to the compiler.

I am not even sure about that for #pragma unroll. This pragma does not enable more or different optimizations by conveying more guarantees to the compiler. I would agree for different pragmas, like #pragma omp parallel for, which gives the guarantee that the following loop can be executed in parallel and free of races. Or #pragma ivdep, which gives guarantees on the semantic of loads and stores to adjacent data items.

These are, however, not always possible due to certain - sometimes wrong but usually implicit - assumptions on the data access/properties inside the loop or do not lead to the desired effect, as the compiler is too dumb to see them.

Agreed for other kinds of pragmas.

With LLAMA one could clarify these assumptions made by the progammer by making them explicit, thus making the intent of a loop decoration clear to both the compiler and other programmers. It would lead to more informed pragmas that clearly extend their initial capabilities.

I agree kind of here. We see similar ideas in the parallel STL, where we can annotate loops (inside the STL algorithms) with information on whether they can be executed in multiple threads or element access can be unsequenced within a thread (vectorization). I could imagine a bigger facility here similar to execution policies.

Nevertheless, pragmas are an example of the programmer explicitly providing intent on optimization to the compiler that could benefit from knowledge on data properties and access. If this intent is not known to both LLAMA and Alpaka, we might be missing opportunities.

Agreed for some kinds of pragmas.

bernhardmgruber · 2021-10-29T10:16:58Z

Btw, it turns out alpaka already has such a portable macro in alpaka/core/Unroll.hpp. I am surprised that nobody knew (or bothered to speak up).

BenjaminW3 · 2021-10-29T16:56:59Z

Can this issue be closed now that you found the solution or is there something missing?

bernhardmgruber · 2021-10-29T22:13:21Z

Sure. I wanted to wait until #1437 is in, then I close that topic for me.

psychocoderHPC added the Type:Enhancement label Oct 20, 2021

j-stephan added the Backend:SYCL label Oct 21, 2021

bernhardmgruber closed this as completed Oct 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

#pragma unroll #1425

#pragma unroll #1425

bernhardmgruber commented Oct 19, 2021

j-stephan commented Oct 21, 2021

bussmann commented Oct 22, 2021

bernhardmgruber commented Oct 22, 2021

bussmann commented Oct 24, 2021 •

edited

Loading

bernhardmgruber commented Oct 29, 2021

bernhardmgruber commented Oct 29, 2021

BenjaminW3 commented Oct 29, 2021

bernhardmgruber commented Oct 29, 2021

#pragma unroll #1425

#pragma unroll #1425

Comments

bernhardmgruber commented Oct 19, 2021

j-stephan commented Oct 21, 2021

bussmann commented Oct 22, 2021

bernhardmgruber commented Oct 22, 2021

bussmann commented Oct 24, 2021 • edited Loading

bernhardmgruber commented Oct 29, 2021

bernhardmgruber commented Oct 29, 2021

BenjaminW3 commented Oct 29, 2021

bernhardmgruber commented Oct 29, 2021

bussmann commented Oct 24, 2021 •

edited

Loading