Skip to content

Unstructured OpenMP code #1178

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Nov 3, 2021
Merged

Unstructured OpenMP code #1178

merged 4 commits into from
Nov 3, 2021

Conversation

vdonaldson
Copy link
Collaborator

Address several problems with OpenMP constructs and unstructured code.

In fir, code that contains a GOTO, EXIT, or any of a number of other
branches is unstructured. OpenMP generally prohibits branching into
or out of an OpenMP construct, but allows unstructured branches where
the source and target are both local to the construct. A structured
loop is implemented with a fir.do_loop op, and a structured IF is
implemented with a fir.if op. Unstructured loops and IFs are implemented
with explicit branches between basic blocks. This PR allows an OpenMP
construct to immediately follow unstructured code (see PR 1077), and
allows an OpenMP construct to contain unstructured code (see Issue 1120).

The same issues are likely present in OpenACC code. The infrastructure
changes in this PR should also be valid for OpenACC code, but file
OpenACC.cpp is not changed.

Address several problems with OpenMP constructs and unstructured code.

In fir, code that contains a GOTO, EXIT, or any of a number of other
branches is _unstructured_.  OpenMP generally prohibits branching into
or out of an OpenMP construct, but allows unstructured branches where
the source and target are both local to the construct.  A structured
loop is implemented with a fir.do_loop op, and a structured IF is
implemented with a fir.if op.  Unstructured loops and IFs are implemented
with explicit branches between basic blocks.  This PR allows an OpenMP
construct to immediately follow unstructured code (see PR 1077), and
allows an OpenMP construct to contain unstructured code (see Issue 1120).

The same issues are likely present in OpenACC code.  The infrastructure
changes in this PR should also be valid for OpenACC code, but file
OpenACC.cpp is not changed.
Copy link
Collaborator

@jeanPerier jeanPerier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks reasonable to me. Please wait for a more enlightened opinion regarding these CFGs/OpenMP changes like @schweitzpgi and @kiranchandramohan.

Copy link
Collaborator

@kiranchandramohan kiranchandramohan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. Thanks @vdonaldson for this patch. It will help us unblock our SNAP application work. Really appreciate that your work in fixing this.

I have a few minor comments for tests and one question otherwise.

I think I can close #1077 in favour of this patch. Is that OK?

@Leporacanthicus can you give this patch a test with the SNAP code without your workarounds?

@@ -0,0 +1,111 @@
! RUN: bbc %s -fopenmp -o "-" | FileCheck %s
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: A comment on what each of these test is testing might be helpful for future reference.

Nit: It might also be good to have PFT tests for these cases like the one in 1077.
d01b9ef#diff-280157740e4d34bdf8b6c3fa1aafdd92f337a59badc9e36162ec1484a7e36eca

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added comments for each test subroutine. Subroutine ss1 exercises the 1077 functionality.

@@ -0,0 +1,111 @@
! RUN: bbc %s -fopenmp -o "-" | FileCheck %s

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Suggestion for comment: Test that OpenMP constructs can be target of branches.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a comment

!$omp end master
print*
end

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Suggestion for comment: Test unstructured code inside OpenMP construct.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a comment

print*, 'ss2-C', i
print*
end

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Suggestion for comment: Test unstructured code inside a do loop inside OpenMP do construct.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a comment

auto *eval = constructAndDirectiveStack.back();
if (eval->isExecutableDirective()) {
// A construct at the end of an (unstructured) OpenACC or OpenMP
// construct region must have an exit target inside the region.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the SS3 test?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is exercised by test subroutines ss2 and ss3.

@@ -93,9 +93,27 @@ static void genObjectList(const Fortran::parser::OmpObjectList &objectList,
}
}

/// Create empty blocks for the current region.
/// These blocks replace blocks parented to an enclosing region.
void createEmptyRegionBlocks(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this function called for each OpenMP construct/region?

I don't see it changing the current region. Will this work for nested OpenMP regions (where the outer and inner have unstructured code)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is intended to operate on the current region, as updated before the call in createBodyOfOp.

@Leporacanthicus
Copy link
Collaborator

Leporacanthicus commented Nov 1, 2021

I'm trying this patch now on SNAP (with small changes, rather than the larger changes I have to "work around unstructured constructs")

I hit this:

bbc: /data/f18-llvm-project/flang/lib/Lower/OpenMP.cpp:117: void createEmptyRegionBlocks(fir::FirOpBuilder &, std::list<Fortran::lower::pft::Evaluation> &): Assertion `eval.block->empty() && "block is not empty"' failed.

I'm attempting to produce a small reproducer. I'm not suggesting this is a blocker for getting this patch in, but that there may be some more work needed for this to be completely solved - up to others to decide whether we should "work more on this patch" or "add separate patch later".

@Leporacanthicus
Copy link
Collaborator

Here's a small reproducer:

program lineloop
  implicit none
  INTEGER ::  n
  
  !$OMP PARALLEL DO NUM_THREADS(2) SCHEDULE(STATIC,1) PRIVATE(n)
  DO n = 1, 2
     IF ( n > 2 )  CYCLE
  END DO
  !$OMP END PARALLEL DO
end program lineloop

Note that a previously posted program with print *,i after the if ... cycle works fine - and of course, if I add some print to this code, it also compiles without issue. The original code is in mkba_sweep.f90 in the SNAP application, and the original loop (line_loop) is just under 300 lines of code.

@Leporacanthicus
Copy link
Collaborator

Good news: As far as I can tell, the remainder of the SNAP code compiles and builds correctly with this patch.

@vdonaldson
Copy link
Collaborator Author

I think I can close #1077 in favour of this patch. Is that OK?

That was the intent of adding a more narrowly focused implementation of the 1077 code here, along with some related comment updates. We've still got problems to solve though.

@vdonaldson
Copy link
Collaborator Author

Here's a small reproducer: . . .

Thanks for this small test case. It turns up at least one more problem variation, and hits at the intersection of some of the other known issues. The most obvious problem is that the $omp parallel do effectively splits into $omp parallel + $omp do, each with its own region. The code as is doesn't allow for that. There is also an instance of the problem of requiring a construct exit target inside the correct region that isn't accounted for, and possibly other issues as well. I don't yet have a fix for this set of problems.

The test case as is hits a CYCLE to construct outside of PARALLEL DO construct is not allowed error. That is in invalid error that is easy to work around; you may already have a fix for that problem.

@Leporacanthicus
Copy link
Collaborator

Here's a small reproducer: . . .

Thanks for this small test case. It turns up at least one more problem variation, and hits at the intersection of some of the other known issues. The most obvious problem is that the $omp parallel do effectively splits into $omp parallel + $omp do, each with its own region. The code as is doesn't allow for that. There is also an instance of the problem of requiring a construct exit target inside the correct region that isn't accounted for, and possibly other issues as well. I don't yet have a fix for this set of problems.

Feel free to work on that in a separate patch. I can see how this patch helps a whole lot, and I don't know enough about this subject to say which is the better approach - two patches, or try to fix everything in one patch.

The test case as is hits a CYCLE to construct outside of PARALLEL DO construct is not allowed error. That is in invalid error that is easy to work around; you may already have a fix for that problem.

Yes, I have reverted the commit that causes that check - it hits a couple of different places in SNAP. I think there is an updated patch in review at the moment.

Copy link
Collaborator

@kiranchandramohan kiranchandramohan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks, Val for this patch. I guess we can fix the other issue in a separate patch.

Copy link
Collaborator

@Leporacanthicus Leporacanthicus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

I have raised #1192 for the remaining known issue in SNAP.

With this fix, I'm able to compile all of SNAP, and with four $OMP directives commented out, the application runs to completion with what appears to be correct results.

@vdonaldson
Copy link
Collaborator Author

I've pushed a change that takes a step toward resolving the issue with (implicit or explicit) nested constructs. A full solution to that problem may need special code for some cases, such as the implicit parallel do -> parallel + do nesting case. The failure symptom for the test case above changes to a missing basic block terminator.

As a general note, unstructured code (in the fir sense) in OpenMP code is probably not all that common. One way to artificially "create" more tests that exercise unstructured OpenMP code generation is to use the -no-structured-fir developer option. That option bypasses the opportunity to generate fir.do_loop and fir.if ops when that is otherwise allowed, in favor of unstructured branches between basic blocks.

I'll merge this PR as a base for OpenMP developers to address the remaining issues.

@vdonaldson vdonaldson merged commit a81f808 into flang-compiler:fir-dev Nov 3, 2021
@vdonaldson vdonaldson deleted the vkd1 branch November 3, 2021 19:01
kiranchandramohan added a commit to llvm/llvm-project that referenced this pull request May 24, 2022
Since the FIR operations are mostly structured, it is only the functions
that could contain multiple blocks inside an operation. This changes
with OpenMP since OpenMP regions can contain multiple blocks. For
unstructured code, the blocks are created in advance and belong to the
top-level function. This caused code in OpenMP region to be placed under
the function level.

In this fix, if the OpenMP region is unstructured then new blocks are
created inside it.

Note1: This is part of upstreaming from the fir-dev branch of
https://github.com/flang-compiler/f18-llvm-project. The code in this patch is a
subset of the changes in flang-compiler#1178.

Reviewed By: vdonaldson

Differential Revision: https://reviews.llvm.org/D126293

Co-authored-by: Val Donaldson <vdonaldson@nvidia.com>
Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>
Co-authored-by: Valentin Clement <clementval@gmail.com>
mem-frob pushed a commit to draperlaboratory/hope-llvm-project that referenced this pull request Oct 7, 2022
Since the FIR operations are mostly structured, it is only the functions
that could contain multiple blocks inside an operation. This changes
with OpenMP since OpenMP regions can contain multiple blocks. For
unstructured code, the blocks are created in advance and belong to the
top-level function. This caused code in OpenMP region to be placed under
the function level.

In this fix, if the OpenMP region is unstructured then new blocks are
created inside it.

Note1: This is part of upstreaming from the fir-dev branch of
https://github.com/flang-compiler/f18-llvm-project. The code in this patch is a
subset of the changes in flang-compiler/f18-llvm-project#1178.

Reviewed By: vdonaldson

Differential Revision: https://reviews.llvm.org/D126293

Co-authored-by: Val Donaldson <vdonaldson@nvidia.com>
Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>
Co-authored-by: Valentin Clement <clementval@gmail.com>
kiranchandramohan added a commit to llvm/llvm-project that referenced this pull request Jun 5, 2023
Unstructured regions presents some issues for OpenMP code generation.
While there are no branches out of the OpenMP region, there can be
branches inside. This required the availability of an artificial
target at the end of an OpenMP region. This was implemented by
insertion an artifical `continue` and marking it as a target for
a branch.
(flang-compiler#1178)

The artificial target is not required for OpenMP loops. Since the
DO loop end can itself be a target of a branch. Moreover, insertion
of the continue between the end of the loop and the end of the
OpenMP loop construct presents problems since the OpenMP MLIR
loop construct models both the loop and the construct. This can
cause the terminator of the OpenMP loop construct to be missed.
This patch solves the issue by skipping the insertion of the
continue.

Note: This issue is only hit if the `end openmp loop` directive
is missed.

This patch fixes the issues in:
-> #58378
-> flang-compiler#1426

Fixes #58378

Reviewed By: vdonaldson

Differential Revision: https://reviews.llvm.org/D151700
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants