Consolidating OpenACC device-host memory transfers #1315
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR consolidates much of the OpenACC host and device data transfers during the course of the dynamical execution to two subroutines
mpas_atm_pre_dynamics_h2d
andmpas_atm_post_dynamics_d2h
that are called before and after the call toatm_srk3
subroutine. Due toatm_compute_solve_diagnostics
also being called once before the start of model run, we also have a pair of subroutinesmpas_atm_pre_computesolvediag_h2d
andmpas_atm_post_computesolvediag_d2h
to handle data movements around the first call toatm_compute_solve_diagnostics
. Any fields copied onto the device in these subroutines are removed from explicit data movement statements in the dynamical core.The mesh/time-invariant fields are still copied onto the device in
mpas_atm_dynamics_init
and removed from the device inmpas_atm_dynamics_finalize
, with the exception of select fields moved inmpas_atm_pre_computesolvediag_h2d
andmpas_atm_post_computesolvediag_d2h
. This is a special case due toatm_compute_solve_diagnostics
being called for the first time before the call tompas_atm_dynamics_init
This PR also includes explicit host-device data transfers in the
mpas_atm_iau
,mpas_atmphys_interface
andmpas_atmphys_todynamics
modules to ensure that the physics and IAU regions, which run on CPU, use the latest values from the dynamical core running on GPUs, and vice versa. In addition, this PR also includes explicit data transfers around halo exchanges in theatm_srk3
subroutine.These subroutines for data routines, and the
acc update
statements are an interim solution until we have a book-keeping method in place.This PR also introduces a couple of new timers to keep track of the cost of data transfers.