You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[flang][runtime] Replace recursion with iterative work queue
Recursion, both direct and indirect, prevents accurate stack size
calculation at link time for GPU device code. Restructure these
recursive (often mutually so) routines in the Fortran runtime
with new implementations based on an iterative work queue with
suspendable/resumable work tickets: Assign, Initialize, initializeClone,
Finalize, Destroy, and DescriptorIO.
Note that derived type FINAL subroutine calls, defined assignments,
and defined I/O procedures all perform callbacks into user code,
which may well reenter the runtime library. This kind of recursion
is not handled by this change, although it may be possible to do so
in the future using thread-local work queues.
The effects of this restructuring on CPU performance are yet to be
measured.
There is a fast(?) mode in the work queue implementation that causes
new work items to be executed to completion immediately upon
creation, saving the overhead of actually representing and managing
the work queue. This mode can't be used on GPU devices, but it
is enabled by default for CPU hosts. It can be disabled easily
for debugging and performance testing.
0 commit comments