Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate changing MOM thermodynamic time step (DT_THERM) #138

Open
4 tasks
dougiesquire opened this issue Apr 10, 2024 · 12 comments
Open
4 tasks

Investigate changing MOM thermodynamic time step (DT_THERM) #138

dougiesquire opened this issue Apr 10, 2024 · 12 comments

Comments

@dougiesquire
Copy link
Collaborator

We need to test the effect of changing DT_THERM on ACCESS-OM3 performance and physical fields.

Once ACCESS-NRI/access-om3-configs#48 is merged and #137 is closed, we'll use the MOM6-CICE6/025deg_jra55do_ryf configuration as a baseline for runs with longer DT_THERM:

  • DT_THERM = 2700.0
  • DT_THERM = 5400.0
  • DT_THERM = 8100.0
  • DT_THERM = 10800.0

To run with these, the following parameters will also need to be changed:

THERMO_SPANS_COUPLING = True
SINGLE_STEPPING_CALL = False

@aekiss, @AndyHoggANU do you have any suggestions for what should be looked at in the physical fields?

@adele-morrison
Copy link

@aekiss, @AndyHoggANU do you have any suggestions for what should be looked at in the physical fields?

  • Time series of global average temperature and salinity,
  • Zonal average temperature and salinity (i.e. depth/latitude maps)
  • Zonally integrated overturning in density / latitude space, or time series of max/min overturning at particular latitudes.
  • Time series of Drake Passage zonal transport.

@dougiesquire
Copy link
Collaborator Author

Thanks @adele-morrison!

@aekiss
Copy link
Contributor

aekiss commented Apr 10, 2024

See comments here: ACCESS-NRI/access-om3-configs#48 (comment)

@minghangli-uni
Copy link
Contributor

minghangli-uni commented Jun 20, 2024

This comment covers parameters related to THERMO_SPANS_COUPLING within the module MOM of the MOM input parameters.

In MOM6, tracer advection is stepped with the thermodynamic timestep, which can be much longer than the coupling timestep. This can be achieved by enabling THERMO_SPANS_COUPLING. In the following setup, it is set to 8100s, which is 6 times longer than the coupling timestep of 1350s. Similar tracer timesteps can be found in GFDL OM4 0.25deg, and GFDL OM5 0.25deg.

THERMO_SPANS_COUPLING = True     !   [Boolean] default = False
                                 ! If true, the MOM will take thermodynamic and tracer timesteps that can be
                                 ! longer than the coupling timestep. The actual thermodynamic timestep that is
                                 ! used in this case is the largest integer multiple of the coupling timestep
DT_THERM = 8100.0                !   [s] default = 1350.0
                                 ! The thermodynamic and tracer advection time step. Ideally DT_THERM should be
                                 ! an integer multiple of DT and less than the forcing or coupling time-step,
                                 ! unless THERMO_SPANS_COUPLING is true, in which case DT_THERM can be an integer
                                 ! multiple of the coupling timestep.  By default DT_THERM is set to DT.
DTBT_RESET_PERIOD = 8100.0       !   [s] default = 1350.0 - (DT_THERM)
                                 ! The period between recalculations of DTBT (if DTBT <= 0). If DTBT_RESET_PERIOD
                                 ! is negative, DTBT is set based only on information available at
                                 ! initialization.  If 0, DTBT will be set every dynamics time step. The default
                                 ! is set by DT_THERM.  This is only used if SPLIT is true.

A preliminary test compared two cases for a 10-day run using 1440 cpu cores with a PE layout of #ocn: 1344, #ice: 96, #cpl: 96, #atm: 48 and #rof: 48.

Case dt_dyn dt_therm_ice dt_cpl dt_therm Run duration (ocn)
THERMO_SPANS_COUPLING = False 1350s 1350s 1350s 1350s 465.23s
THERMO_SPANS_COUPLING = True 1350s 1350s 1350s 8100s 184.98s

The results show a reduction in run duration from 465.23s to 184.98s, significantly improving performance.

However, further scientific testing for longer runs is necessary to confirm that the differences are negligible.

@minghangli-uni
Copy link
Contributor

minghangli-uni commented Jul 5, 2024

The above comment only changes the ocn dt_therm, causing it to differ from the coupling timestep. Hence DIABATIC_FIRST must be set to False. Enabling the diabatic process before the dynamic step requires the tracer timestep to be the same as the coupling timestep.

DIABATIC_FIRST = False          !   [Boolean] default = False
                                ! If true, apply diabatic and thermodynamic processes, including buoyancy
                                ! forcing and mass gain or loss, before stepping the dynamics forward.

Otherwise an error pops up,

    if (CS%diabatic_first .and. (CS%t_dyn_rel_adv==0.0) .and. do_thermo) then ! do thermodynamics.
...
      elseif (thermo_does_span_coupling) then
        dtdia = dt_therm
        if ((fluxes%dt_buoy_accum > 0.0) .and. (dtdia > time_interval) .and. &
            (abs(fluxes%dt_buoy_accum - dtdia) > 1e-6*dtdia)) then
          call MOM_error(FATAL, "step_MOM: Mismatch between long thermodynamic "//&
            "timestep and time over which buoyancy fluxes have been accumulated.")
        endif
...

@aekiss
Copy link
Contributor

aekiss commented Jul 22, 2024

FYI increasing DT_THERM also gives a significant speedup at 1°.

I just tried out 1deg_jra55do_ryf with

DIABATIC_FIRST = False
DT = 1800.0
DT_THERM = 10800.0      ! 6*DT
THERMO_SPANS_COUPLING = True
DTBT_RESET_PERIOD = 10800.0

The walltime for 1 month was 11:05, compared to 17:33 with the previous value DT_THERM = 3600.0 (double DT)

@aekiss
Copy link
Contributor

aekiss commented Aug 21, 2024

Some testing suggestions as discussed in today's TWG:

  1. save diagnostics that will be sensitive to numerical artefacts due to excessively large DT_THERM. I'm guessing these artefacts will show up as grid-scale noise, so a diagnostic that's sensitive to short length scales could be a good way to detect these, e.g. T_diffx, T_diffy, S_diffx, S_diffy saved as snapshots, not time averages.
  2. I expect such numerical issues to be visible quickly, so do some very short runs (one or two thermo timesteps, starting from a control with DT_THERM=DT that's spun up for a decade or more) with ridiculously large DT_THERM, and compare with DT_THERM=DT at the same model time to get a feel for what the artefacts took like. Then do some more short runs with reduced DT_THERM to see how low it needs to be to reduce the artefacts to an acceptable level.
  3. Try longer runs (decade or more) with DT_THERM chosen from previous step to see if any problems arise.

@access-hive-bot
Copy link

This issue has been mentioned on ACCESS Hive Community Forum. There might be relevant details there:

https://forum.access-hive.org.au/t/cosima-twg-meeting-minutes-2024/1734/16

anton-seaice pushed a commit to ACCESS-NRI/access-om3-wav-configs that referenced this issue Aug 21, 2024
anton-seaice pushed a commit to ACCESS-NRI/access-om3-wav-configs that referenced this issue Aug 21, 2024
dougiesquire added a commit to ACCESS-NRI/access-om3-configs that referenced this issue Aug 21, 2024
dougiesquire added a commit to ACCESS-NRI/access-om3-configs that referenced this issue Aug 21, 2024
dougiesquire added a commit to ACCESS-NRI/access-om3-configs that referenced this issue Aug 21, 2024
minghangli-uni added a commit to ACCESS-NRI/access-om3-configs that referenced this issue Aug 22, 2024
@aekiss
Copy link
Contributor

aekiss commented Sep 10, 2024

Just to preserve @AndyHoggANU's slack comment: it's thought that DT_THERM can be set to resolve the relevant physics (e.g. around 1-3hr to capture the diurnal cycle, given that JRA55do is 3-hourly). This would be independent of the horizontal grid resolution, so could makes things much cheaper at high resolution.

@minghangli-uni
Copy link
Contributor

minghangli-uni commented Oct 15, 2024

Updates on timestep selections

Experiment details

  1. Control (run0): (red line in line plots)
    • baroclinic timestep = coupling timestep = tracer timestep = 1350s
    • DIABATIC_FIRST=True
    • configuration available at: here (commit hash f4b2d1e)
  2. Perturbation (run1): (blue line in line plots)
    • baroclinic timestep = coupling timestep = tracer timestep = 1350s
    • DIABATIC_FIRST=False
  3. Perturbation (run2): (lime line in line plots)
    • baroclinic timestep = coupling timestep = 1350s
    • tracer timestep = 10800s (3 hours)
    • DIABATIC_FIRST=False
  4. Perturbation (run3): (green line in line plots)
    • baroclinic timestep = coupling timestep = 900s
    • tracer timestep = 900
    • DIABATIC_FIRST=False
  5. Perturbation (run4): (black line in line plots)
    • baroclinic timestep = coupling timestep = 900s
    • tracer timestep = 7200s (2 hours)
    • DIABATIC_FIRST=False
  6. Perturbation (run5): (pink line in line plots)
    • baroclinic timestep = coupling timestep = 900s
    • tracer timestep = 10800s (3 hours)
    • DIABATIC_FIRST=False

Additional notes

  1. The thermodynamic timestep in CICE is consistent with the baroclinic timestep for all runs.
  2. The control run (run0) is the only case where DIABATIC_FIRST is enabled (DF in the legend).
  3. All runs start from the same initial conditions (rr in the legend) and use the existing (old) bathymetry.
  4. Contour plots represent time-averaged results over 4 years.
  5. It is worth to note that truncation errors occurred only for runs where DT=1350s and DT_THERM=10800s (3 hours), with maximum truncations below 50. By increasing the truncation threshold (MAXTRUNC), these runs were allowed to continue. This is the reason behind selecting DT=900s for run[3-5].
  6. The current velocity truncation occurs based on the CFL number (CFL_BASED_TRUNCATIONS), where CFL_TRUNCATE is set to a default value of 0.5. Beyond this value, velocities will be truncated. For the case with DT=1350s and DT_THERM=10800s, truncation occurs at a CFL number of less than 0.52.
  7. The performance of MOM6 in run2 is comparable to MOM5, while run5 is ~20% slower than MOM5. An additional experiment with DT=1200s is currently running.

Summary for the following plots:

  • Global ocean potential temperature (thetaoga) is mostly affected by DT_THERM, regardless of DT.
  • Global ocean potential salinity (soga) shows minimal variation across different combinations of DT and DT_THERM.
  • Zonal average temperature and salinity show similar patterns.
  • In general, the time series for zonal and meridional transport are comparable.
  1. Time series of global average temperature and salinity
    global_temp_salinity

  2. Zonal average temperature and salinity (i.e. depth/latitude maps)
    2.1 temperature
    thetao_depth_latitude
    2.2 salinity
    so_depth_latitude

  3. Meridional Ocean Circulation
    MOC

  4. Time series of zonal/meridional transport
    Transports

@minghangli-uni
Copy link
Contributor

Updates on DT=1200s and DT=1080s with 3 hour tracer timestep

Velocity truncations occurred in both the DT=1200s and DT=1080s runs, similar to what was observed with DT=1350s with a 3-hour tracer timestep. All truncations happened at the same locations ~(70.39N, 57.83E), marked by a hollow red circle in the figure below. The background contour plot represents the bathymetry depth.

As discussed in today’s TWG meeting, in some versions of OM2, we applied additional friction (Rayleigh drag) at this location to resolve similar issues. I plan to re-test it after updating the bathymetry.
truncation_locations

@access-hive-bot
Copy link

This issue has been mentioned on ACCESS Hive Community Forum. There might be relevant details there:

https://forum.access-hive.org.au/t/cosima-twg-meeting-minutes-2024/1734/19

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants