Skip to content

Commit 6ce4369

Browse files
jdoerferttru
authored andcommitted
[OpenMP][FIX] Ensure __kmpc_kernel_parallel is reachable
The problem is we create the call to __kmpc_kernel_parallel in the openmp-opt pass but while we optimize the code, the call is not there yet. Thus, we assume we never reach it from __kmpc_target_deinit. That allows us to remove the store in there (`ParallelRegionFn = nullptr`), which leads to bad results later on. This is a shortstop solution until we come up with something better. Fixes llvm#57064 (cherry picked from commit a8cda32)
1 parent a9ac5ac commit 6ce4369

File tree

1 file changed

+14
-2
lines changed

1 file changed

+14
-2
lines changed

openmp/libomptarget/DeviceRTL/src/Kernel.cpp

+14-2
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ static void genericStateMachine(IdentTy *Ident) {
3535
uint32_t TId = mapping::getThreadIdInBlock();
3636

3737
do {
38-
ParallelRegionFnTy WorkFn = 0;
38+
ParallelRegionFnTy WorkFn = nullptr;
3939

4040
// Wait for the signal that we have a new work function.
4141
synchronize::threads();
@@ -100,8 +100,20 @@ int32_t __kmpc_target_init(IdentTy *Ident, int8_t Mode,
100100
// doing any work. mapping::getBlockSize() does not include any of the main
101101
// thread's warp, so none of its threads can ever be active worker threads.
102102
if (UseGenericStateMachine &&
103-
mapping::getThreadIdInBlock() < mapping::getBlockSize(IsSPMD))
103+
mapping::getThreadIdInBlock() < mapping::getBlockSize(IsSPMD)) {
104104
genericStateMachine(Ident);
105+
} else {
106+
// Retrieve the work function just to ensure we always call
107+
// __kmpc_kernel_parallel even if a custom state machine is used.
108+
// TODO: this is not super pretty. The problem is we create the call to
109+
// __kmpc_kernel_parallel in the openmp-opt pass but while we optimize it is
110+
// not there yet. Thus, we assume we never reach it from
111+
// __kmpc_target_deinit. That allows us to remove the store in there to
112+
// ParallelRegionFn, which leads to bad results later on.
113+
ParallelRegionFnTy WorkFn = nullptr;
114+
__kmpc_kernel_parallel(&WorkFn);
115+
ASSERT(WorkFn == nullptr);
116+
}
105117

106118
return mapping::getThreadIdInBlock();
107119
}

0 commit comments

Comments
 (0)