Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#pragma unroll #1425

Closed
bernhardmgruber opened this issue Oct 19, 2021 · 8 comments
Closed

#pragma unroll #1425

bernhardmgruber opened this issue Oct 19, 2021 · 8 comments

Comments

@bernhardmgruber
Copy link
Member

Alpaka should support a portable way to request unrolling of loops, which is typicall done via the #pragma unroll directive.

This request came up during the discussion of an optimization proposed for PIConGPU: ComputationalRadiationPhysics/picongpu#3859

@j-stephan
Copy link
Member

A solution for this issue is closely related to other loop decorations (such as FPGA pipelining). I've had this on my wish list for a time but never got around to design something nice. Glad to see that this is now needed elsewhere.

@bussmann
Copy link

Loop decorations are Information on algorithms provided by the user that also (implicitly) provide information on data. It would be interesting to understand how alpaka and llama could share this Information.

@bernhardmgruber
Copy link
Member Author

Loop decorations are Information on algorithms provided by the user that also (implicitly) provide information on data. It would be interesting to understand how alpaka and llama could share this Information.

I beg to disagree. #pragma unroll is purly about how the compiler produces an instruction stream from the loop. It does not contain any information on how data is touched. And in the case of ComputationalRadiationPhysics/picongpu#3859 it pokes the optimizer in the right direction.

Example:

void f(auto data) {
    for (int i = 0; i < 8; i+=2)
        access(data, i);
}
void g(auto data) {
    #pragma unroll
    for (int i = 0; i < 8; i+=2)
        access(data, i);
}
void h(auto data) {
    access(data, 0);
    access(data, 2);
    access(data, 4);
    access(data, 6);
}

Here, f, g and h have completely equivalent semantic. Still, g and h might expose more instruction level parallelism than f. However, they might also blow up the executable size. All functions might also produce the same assembly. #pragma unroll is just an optimization hint. I don't see how this information is relevant for alpaka or llama.

@bussmann
Copy link

bussmann commented Oct 24, 2021

O.k., this was formulated wrongly by me. A programmer will use these pragmas to convey certain optimization possibilities to the compiler.

These are, however, not always possible due to certain - sometimes wrong but usually implicit - assumptions on the data access/properties inside the loop or do not lead to the desired effect, as the compiler is too dumb to see them.

With LLAMA one could clarify these assumptions made by the progammer by making them explicit, thus making the intent of a loop decoration clear to both the compiler and other programmers. It would lead to more informed pragmas that clearly extend their initial capabilities.

This would thus be an extension to what is proposed here. It boils down to connecting certain intents of the programmer in expressing a parallelization optimization to explicit assumptions on data properties and access that support/enable this optimization. This will be a recurring theme when looking at more complex parallel algorithms.

Even when looking at the simple OpenMP pragmas https://www.openmp.org/spec-html/5.1/openmpsu53.html one finds partial unrolling and tiling as options that clearly relate to/are most effective when e.g. using tiled data structures, optimizing throughput, vectorization, controlling register usage or enhancing cache reuse. When looking at FPGAs the possibilities of control on execution and memory access are much broader.

Whether this belongs in a portable loop decoration pragma definition one can discuss. Nevertheless, pragmas are an example of the programmer explicitly providing intent on optimization to the compiler that could benefit from knowledge on data properties and access. If this intent is not known to both LLAMA and Alpaka, we might be missing opportunities.

@bernhardmgruber
Copy link
Member Author

A programmer will use these pragmas to convey certain optimization possibilities to the compiler.

I am not even sure about that for #pragma unroll. This pragma does not enable more or different optimizations by conveying more guarantees to the compiler. I would agree for different pragmas, like #pragma omp parallel for, which gives the guarantee that the following loop can be executed in parallel and free of races. Or #pragma ivdep, which gives guarantees on the semantic of loads and stores to adjacent data items.

These are, however, not always possible due to certain - sometimes wrong but usually implicit - assumptions on the data access/properties inside the loop or do not lead to the desired effect, as the compiler is too dumb to see them.

Agreed for other kinds of pragmas.

With LLAMA one could clarify these assumptions made by the progammer by making them explicit, thus making the intent of a loop decoration clear to both the compiler and other programmers. It would lead to more informed pragmas that clearly extend their initial capabilities.

I agree kind of here. We see similar ideas in the parallel STL, where we can annotate loops (inside the STL algorithms) with information on whether they can be executed in multiple threads or element access can be unsequenced within a thread (vectorization). I could imagine a bigger facility here similar to execution policies.

Nevertheless, pragmas are an example of the programmer explicitly providing intent on optimization to the compiler that could benefit from knowledge on data properties and access. If this intent is not known to both LLAMA and Alpaka, we might be missing opportunities.

Agreed for some kinds of pragmas.

@bernhardmgruber
Copy link
Member Author

Btw, it turns out alpaka already has such a portable macro in alpaka/core/Unroll.hpp. I am surprised that nobody knew (or bothered to speak up).

@BenjaminW3
Copy link
Member

Can this issue be closed now that you found the solution or is there something missing?

@bernhardmgruber
Copy link
Member Author

Sure. I wanted to wait until #1437 is in, then I close that topic for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants