Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(*)Add parentheses for FMA rotational symmetry #1634

Merged

Conversation

Hallberg-NOAA
Copy link
Collaborator

@Hallberg-NOAA Hallberg-NOAA commented Jul 31, 2024

This series of 30 commits adds parentheses in 54 files throughout the MOM6 code base, or (in a few cases) revises the order of sums in expressions for various u- and v-component calculations, so that all answers exhibit rotational symmetry when fused-multiply-adds (FMAs) are enabled and appropriate parameter settings are chosen.

As a part of these changes, there is a new element ocean_grid_type%Coriolis2Bu with the square of the Coriolis parameter in the widely used transparent ocean grid type.

In order to obtain rotational symmetry (with or without FMAs) for all cases in the NOAA-GFDL MOM6-examples regression suite where FMS supports grid rotation (plus non-Boussinesq versions of these test cases and some extra examples to reflect parameterizations used in CESM prototypes), the following parameter settings may be needed:

  #override DEFAULT_ANSWER_DATE = 99991231    ! This gives rotational symmetry if it is higher than 20240701
  #override HOR_DIFF_ANSWER_DATE = 99991231    ! This must be 20240331 or higher for rotational symmetry.
  #override NDIFF_ANSWER_DATE = 99991231    ! This must be 20240331 or higher for rotational symmetry.
  #override WAVE_INTERFACE_ANSWER_DATE = 99991231    ! This must be 20230103 or higher for rotational symmetry.
  #override MEKE_GM_SRC_ANSWER_DATE = 99991231    ! This must be 20240601 or higher for rotational symmetry.

  #override BT_USE_OLD_CORIOLIS_BRACKET_BUG = False
  #override USE_GM_WORK_BUG = False
  #override MEKE_GM_SRC_ALT_SLOPE_BUG = False

  #override USE_OCMIP2_CFC = False    ! The FMS routine atmos_ocn_coupler_flux() does not support grid rotation yet.

Note that FMS does not currently support an eastern, western or southern tripolar grid, so global test cases with a northern tripolar grid can not be tested for rotational symmetry. However, the Baltic test cases have been used to verify the rotational symmetry of all of the parameterizations that are used in global models.

Note also that automated grid rotation testing is not supported yet for all types of open boundary conditions or for velocities with ALE sponges.

All answers are bitwise identical when FMAs are not enabled, but answers do change when FMAs are enabled. The only change in public interfaces is the addition of a new squared Coriolis parameter element in the transparent ocean grid type. The list of commits in this PR include:

  Added parentheses to expressions taking the squares of the wind stress
components in 70 lines in 7 files so that these expressions will be rotationally
invariant when fused-multiply-adds are enabled.  All answers are bitwise
identical in cases without FMAs, but answers could change with FMAs.
  Added parentheses to 9 expressions like `curv_3 = h_W(i) + h_E(i) - 2.0*h(i)`
in PPM_limit_pos, zonal_flux_layer, zonal_flux_thickness, merid_flux_layer and
merid_flux_thickness, changing them to `curv_3 = (h_W(i) + h_E(i)) - 2.0*h(i)`.
This change enforces the order of arithmetic that is required to give rotational
symmetry, but it also is the order that the Intel, GNU, and Nvidia compliers
were all already using in these expressions.  Moreover, had the order of
arithmetic ever been anything else, this would have led to failures in our
rotational consistency and redundant point consistency testing, and almost
certainly would have been detected before. However, by adding these parentheses,
there is a remote chance that the addition of these parentheses could change
answers for some compiler or compiler settings we have never tested before.
This change should not impact any FMA-enabled calculations.  All answers are
bitwise identical in the MOM6-examples regression suite as run on Gaea.
  Added parentheses to 16 expressions setting the grad_gradient arrays with
oblique_grad open boundary conditions and setting cff_new with all kinds of
oblique boundary conditions so that they will be rotationally invariant when
fused-multiply-adds are enabled.  All answers are bitwise identical in cases
without FMAs, but answers with certain types of open boundary conditions could
change with FMAs.
  Added parentheses to 150 lines in the 5 generic density integral routines
(int_density_dz_generic_pcm, int_density_dz_generic_plm,
int_density_dz_generic_ppm, int_spec_vol_dp_generic_pcm and
int_spec_vol_dp_generic_plm) in the MOM_density_integrals module so that they
will be rotationally invariant when fused-multiply-adds are enabled.  All
answers are bitwise identical in cases without FMAs, but answers could change
with FMAs.
  Added parentheses to 140 lines in 8 int_density_dz and int_spec_vol_dp
routines for the linear, Wright, Wright_full and Wright_red equations of state
so that they will be rotationally invariant when fused-multiply-adds are
enabled.  All answers are bitwise identical in cases without FMAs, but answers
could change with FMAs.
  Removed recently added parentheses around expressions like '+ (hL*hR)' in 110
lines in MOM_density_integrals and 4 equation of state module to reflect that
these parentheses are not necessary for rotational symmetry in
FMAs.  All answers are bitwise identical in cases without FMAs, but
answers could change with FMAs.
  Added parentheses to 4 lines in PressureForce_Mont_nonBouss and
PressureForce_Mont_Bouss so that they will be rotationally invariant when
fused-multiply-adds are enabled.  All answers are bitwise identical in cases
without FMAs, but answers could change with FMAs in cases that use the
Montgomery potential form of the pressure gradient accelerations.
  Added parentheses to 4 lines in calc_isoneutral_slopes so that they will be rotationally invariant when
fused-multiply-adds are enabled.  All answers are bitwise identical in cases
without FMAs, but answers could change with FMAs.
  Added parentheses to 2 lines in MOM_calc_varT so that they will be
rotationally invariant when fused-multiply-adds are enabled.  In this case, FMAs
can still be applied to the impacted lines, exploiting that the masks are always
0 or 1.  Also added parentheses to 2 other lines used to generate the stochastic
pattern for rotational symmetry with FMAs.  All answers are bitwise identical in
cases without FMAs, but answers could change with FMAs.
  Added parentheses to the calculation of the diffusive temperature changes in
tracer_hordiff so that it will be rotationally invariant when
fused-multiply-adds are enabled.  All answers are bitwise identical in cases
without FMAs, but answers could change with FMAs.
  Added parentheses to the calculation of the iceberg contribution to the
fractional area of ice shelves in iceberg_forces so that it will be rotationally
invariant when fused-multiply-adds are enabled.  All answers are bitwise
identical in cases without FMAs, but answers could change with FMAs enabled in
cases with tabular icebergs.
  Added parentheses to the calculation of the Stokes-drift Coriolis velocity
increments in CoriolisStokes so that it will be rotationally invariant when
fused-multiply-adds are enabled.  All answers are bitwise identical because
CoriolisStokes is still under development and is never called, with a fatal
error occurring if anyone tries to use it.  Also added parentheses to two
expressions calculating the magnitude of the Stokes velocity in
get_Langmuir_Number.  Answers could change for some cases that use Langmuir
turbulence parameterizations with FMAs enabled.
  Added the new element Coriolis2Bu to the ocean_grid_type and the
dyn_horgrid_type to hold the square of the Coriolis parameter, and use this
array in 10 routines (including btstep, set_dtbt, calculate_diagnostic_fields,
VarMix_init, propagate_int_tide, Calculate_kappa_shear, Calc_kappa_shear_vertex
and add_MLrad_diffusivity) that had been calculating and averaging the square of
the Coriolis parameter.  This could change some answers with FMAs enabled
because the compilers were previously free to split up some of the squares
when averaging the squared Coriolis parameter, but without FMAs all answers are
bitwise identical.  This commit does add a new element to two transparent
types.
  Added parentheses to 17 expressions in thickness_diffuse_full,
thickness_diffuse and thickness_diffuse_init to give rotationally consistent
solutions when fused-multiply-adds are enabled.  One comment was also added to
note that the calculation of PE_release_h is does not exhibit rotational
symmetry when MEKE_GM_SRC_ALT is set to true.  All answers are bitwise identical
in cases without FMAs, but answers could change when FMAs are enabled.
  Added parentheses to 2 expressions in the Zanna_Bolton code and rearranged
another line so that the u- and v-discretizations introduce terms in the same
order so that the Zanna_Bolton code will exhibit rotationally consistent
solutions when fused-multiply-adds are enabled.  All answers are bitwise
identical in cases without FMAs, but answers could change with FMAs enabled in
cases that use the Zanna-Bolton parameterization.
  Added parentheses to 19 expressions in the MOM_internal_tides propagation code
to exhibit rotationally consistent solutions when fused-multiply-adds are
enabled.  All answers are bitwise identical in cases without FMAs, but answers
could change when FMAs are enabled in models that use the ray-tracing based
internal tides code.
  Added parentheses to 10 expressions in find_uv_at_h to exhibit rotationally
consistent solutions and treat the velocities at both edges of a tracer cell
equivalently when fused-multiply-adds are enabled. All answers are bitwise
identical in cases without FMAs, but answers could change when FMAs are enabled.
  Added parentheses to 19 expressions in set_viscous_ML, set_u_at_v and
set_v_at_u to treat the velocities at both edges of a tracer cell equivalently
when fused-multiply-adds are enabled, and thereby to exhibit exhibit
rotationally consistent solutions.  Also swapped the order of the u- and
v-components in the u-point calculation of Uh2 to mirror the order of the
corresponging v-point calculation for the same purpose.  All answers are bitwise
identical in cases without FMAs, but answers could change when FMAs are enabled.
  Added mathematically equivalent rearrangements of the code in
calc_kappa_shear_vertex that interpolates velocities, temperatures and
salinities to the vertices to expose the mask variables while ensuring that the
other multiplications occur within parentheses so that they will exhibit
rotational symmetry when fused-multiply-adds are enabled.  FMAs can still occur,
but it will be multiplication by the 0-or-1 masks that are fused with an
addition.  Also added parentheses to 3 expressions calculating the squared shear
in calculate_projected_state for rotational symmetry with FMAs.  All answers are
bitwise identical in cases without FMAs, but answers could change when FMAs are
enabled.
  Added parentheses to 4 expressions in add_drag_diffusivity, set_BBL_TKE and
add_LOTW_BBL_diffusivity setting the bottom-drag contributions to TKE and
friction velocity so that they will exhibit rotationally consistent solutions
when fused-multiply-adds are enabled.  All answers are bitwise identical in
cases without FMAs, but answers could change when FMAs are enabled.
  Added parentheses to 20 expressions in CorAdCalc and one in gradKE to exhibit
rotationally consistent solutions when fused-multiply-adds are enabled.  All
answers are bitwise identical in cases without FMAs, but answers could change
when FMAs are enabled.
  Added parentheses to 18 expressions in btstep, and one more each in set_dtbt
and barotropic_init to exhibit rotationally consistent solutions when
fused-multiply-adds are enabled.  All answers are bitwise identical in cases
without FMAs, but answers could change when FMAs are enabled.
  Added parentheses to 19 expressions in 5 routines (calc_Visbeck_coeffs_old,
calc_Eady_growth_rate_2D, calc_slope_functions_using_just_e,
calc_QG_Leith_viscosity VarMix_init) in MOM_lateral_mixing_coeffs.F90 to give
rotationally consistent solutions when fused-multiply-adds are enabled.  Also
reordered terms in a sum in the calculation of beta_dx2_u to mirror that of
beta_dx2_v, also for rotational symmetry with FMAs.  All answers are bitwise
identical in cases without FMAs, but answers could change for some parameter
settings when FMAs are enabled.
  Added parentheses to 40 expressions horizontal_viscosity and another 14
expressions in in hor_visc_init and 3 more in align_aniso_tensor_to_grid to give
rotationally consistent solutions when fused-multiply-adds are enabled.   Also
swapped the order of two terms in the expression for Del2u to mirror the order
of the corresponding terms in Del2v for rotational symmetry with FMAs.  All
answers are bitwise identical in cases without FMAs, but answers could change
when FMAs are enabled.
  Added parentheses to 20 sums of squares of x- and y- distances or velocity
components used for initialization in 8 modules to give rotationally consistent
solutions when fused-multiply-adds are enabled.  All answers are bitwise
identical in cases without FMAs, but answers could change when FMAs are enabled.
  Added parentheses to 29 sums of squares of velocity or other vector components
used in parameterizations in 9 modules to give rotationally consistent solutions
when fused-multiply-adds are enabled.  All answers are bitwise identical in
cases without FMAs, but answers could change when FMAs are enabled.
  Added parentheses to 9 diagnostics of Coriolis accelerations or expressions
used in the kinetic energy budgets to give rotationally consistent solutions
when fused-multiply-adds are enabled.  All answers are bitwise identical in
cases without FMAs, but answers could change when FMAs are enabled.
  Added parentheses to 4 tracer edge value calculations used with PPM tracer
advection to give rotationally consistent solutions when fused-multiply-adds are
enabled.  Although these lines may not appear to need parentheses, some
compliers appear to be putting these expressions directly into others, where the
direction of the flow seems to determine which multiplications are incorporated
into FMAs.  All answers are bitwise identical in cases without FMAs, but answers
could change when FMAs are enabled.
@Hallberg-NOAA Hallberg-NOAA marked this pull request as draft July 31, 2024 21:46
  Added parentheses around the full expressions for intz and intp in 10 more
lines in 4 generic density integral routines (int_density_dz_generic_pcm,
int_density_dz_generic_ppm, int_spec_vol_dp_generic_pcm and
int_spec_vol_dp_generic_plm) in the MOM_density_integrals module so that
non-Boussinesq cases will be rotationally invariant when fused-multiply-adds are
enabled.  Although this might not seem to do anything, these parentheses do
matter if these expressions are in-lined in the sums where they are used a few
lines later.  The analogous parentheses had previously been added to
int_density_dz_generic_plm.  All answers are bitwise identical in cases without
FMAs, but answers could change with FMAs.
@Hallberg-NOAA Hallberg-NOAA marked this pull request as ready for review July 31, 2024 22:08
Copy link
Collaborator

@kshedstrom kshedstrom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approve this PR, but I have a question. Before turning on FMAs, answers did not change. With FMAs, answers do change, but the unit_tests fail. Is there a way to get it to behave with FMAs?

 The values of WRIGHT drho_dT_dT disagree.  -1.1402818751119733E-02 and  -7.2448371221038111E-03 differ by  -4.15798163E-03 ( -4.46E-01), tol=  3.11848682E-04
   k        Calculated value           Correct value remap_via_sub_cells() 2
   1  1.0000000000000002E+01  1.0000000000000000E+01 err=  1.7763568394002505E-15 <--- WRONG
   2  5.9999999999999982E+00  6.0000000000000000E+00 err= -1.7763568394002505E-15 <--- WRONG
   3  2.0000000000000004E+00  2.0000000000000000E+00 err=  4.4408920985006262E-16 <--- WRONG
   4 -2.0000000000000004E+00 -2.0000000000000000E+00 err= -4.4408920985006262E-16 <--- WRONG
   5 -5.9999999999999982E+00 -6.0000000000000000E+00 err=  1.7763568394002505E-15 <--- WRONG
   6 -1.0000000000000002E+01 -1.0000000000000000E+01 err= -1.7763568394002505E-15 <--- WRONG
   k        Calculated value           Correct value remap_via_sub_cells() 3
   1  1.1500000000000005E+01  1.1500000000000000E+01 err=  5.3290705182007514E-15 <--- WRONG
   2  1.0500000000000002E+01  1.0500000000000000E+01 err=  1.7763568394002505E-15 <--- WRONG
   4  8.4999999999999982E+00  8.5000000000000000E+00 err= -1.7763568394002505E-15 <--- WRONG
   5  7.4999999999999973E+00  7.5000000000000000E+00 err= -2.6645352591003757E-15 <--- WRONG
   6  6.4999999999999929E+00  6.5000000000000000E+00 err= -7.1054273576010019E-15 <--- WRONG
   k        Calculated value           Correct value remap_via_sub_cells() 4
   2 -1.0500000000000004E+01 -1.0500000000000000E+01 err= -3.5527136788005009E-15 <--- WRONG
   3 -1.2000000000000007E+01 -1.2000000000000000E+01 err= -7.1054273576010019E-15 <--- WRONG
Failed tests:   1   3   4   5   
First failed test: attic remapping unit tests
remapping_unit_tests : FAILED

@kshedstrom
Copy link
Collaborator

I should add that this is gfortran on chinook. I read online that the way to turn on FMAs is with -ftree-vectorize -ffast-math -march=native.

@marshallward
Copy link
Collaborator

marshallward commented Aug 1, 2024

Don't use -ffast-math, this turns on a whole load of unreproducible options that we don't want to use. I also don't think you need -ftree-vectorize. (I had never heard of it until you mentioned it.)

You should just first try -mfma. (not quite; see below.) After that, you can also consider

  • -mavx -mfma (add AVX instructions)
  • -O2 -mavx -mfma (with optimization)
  • -O2 -march=core-avx2 -mavx -mfma (with full x86 AVX2 instruction set)

(You might also try -march=native in place of core-avx2.)

At the very least, I would be prepared for different answers from each of those tests. But perhaps they will pass your unit tests.


I misspoke above, GCC also requires -O2 to generate FMA instructions. Sorry about that (and thanks to @Hallberg-NOAA for testing this).

So testing should be more like this:

  • -O2 -mfma (enable FMA)
  • -O2 -mavx -fma (with AVX)
  • -O2 -mavx -fma -march=... (with target architecture)

@mathomp4 mathomp4 self-requested a review August 1, 2024 12:40
Copy link
Collaborator

@mathomp4 mathomp4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I guess the GMAO flags are boring enough that this was a zero-diff change for us. Nice simple approve.

Copy link
Collaborator

@abozec abozec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

COAPS approves.

@jiandewang
Copy link
Collaborator

I am in the processing of merging previous (2024-05-31) PR to dev/emc and UFS repository, please allow me a bit delay on testing this PR.

  Added parentheses to prevent FMAs in 4 expressions in end_value_h4 that rely
on exact vertical symmetry in order to get the cancellations that are necessary
to pass the vertical remapping unit testing.  Before this change, the MOM6
unit-testing was failing when FMAs are enabled, but with it the unit-testing is
passing even when FMAs are enabled.  All answers are bitwise identical in cases
without FMAs, but answers could change with FMAs.
@Hallberg-NOAA
Copy link
Collaborator Author

Hallberg-NOAA commented Aug 2, 2024

We also had some imperfect vertical remapping unit-tests when FMAs are enabled. I have added one more commit with parentheses in part of the ALE remapping code that give us unit tests that all pass with FMAs for the intel compiler version we are using at GFDL. I have also verified that the remapping unit testing is now passing with gfortran and FMAs enabled.

However, I still don't know whether this will also fix all the roundoff-level vertical remapping unit-test anomalies that @kshedstrom was finding with gfortran on Chinook, which seem more extensive that what I was finding with the intel compiler.

@jiandewang
Copy link
Collaborator

tested on 4 non-wcoss machines and got b4b results. Testing on wcoss now, shall have updating tomorrow. Sorry for delayed response here (which is caused by the combination of many unexpected factors that are out of my control).

@jiandewang
Copy link
Collaborator

all UFS testing on wcoss failed to match current existing baseline (as expected), difference are small enough

@jiandewang jiandewang self-assigned this Aug 20, 2024
Copy link
Collaborator

@jiandewang jiandewang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

works as expected. No change on non-wcoss machine, minor change on wcoss

@Hallberg-NOAA
Copy link
Collaborator Author

This PR has been evaluated and approved by representatives of the requisite 5 primary development branches (in addition to the implicit approval from the initiating primary development branch).

@Hallberg-NOAA Hallberg-NOAA merged commit ce58a32 into mom-ocean:main Aug 24, 2024
10 checks passed
Hallberg-NOAA added a commit to NOAA-GFDL/MOM6 that referenced this pull request Sep 4, 2024
  This merge updates dev/gfdl following PR mom-ocean#1634 to main that added parentheses
to give rotational symmetry when fused-multiply-adds are enabled.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants