-
Notifications
You must be signed in to change notification settings - Fork 632
Precision truncation to facilitate solution reproducibility with implicit time stepping on unstructured mesh #1524
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Precision truncation to facilitate solution reproducibility with implicit time stepping on unstructured mesh #1524
Conversation
….1 unstructured case c
… for reproducability test
…BLOCK, ensures B4B with ice forcing
|
@kestonsmith-noaa will you add an explanation of the new namelist somewhere around here: https://github.com/NOAA-EMC/WW3/blob/develop/model/inp/ww3_grid.inp#L348 |
|
@kestonsmith-noaa Could you update your branch to the latest develop? |
|
Should be up to date with develop as of 14:50. Best, Keston
…On Wed, Nov 5, 2025 at 2:36 PM mingchen-NOAA ***@***.***> wrote:
*mingchen-NOAA* left a comment (NOAA-EMC/WW3#1524)
<#1524 (comment)>
@kestonsmith-noaa <https://github.com/kestonsmith-noaa> Could you update
your branch to the latest develop?
—
Reply to this email directly, view it on GitHub
<#1524 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AZUY35E6PQDHD6VFPN25FYL33JGUVAVCNFSM6AAAAACLHWXPXWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTIOJSHE4TCNZRGY>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
|
OK, I have added the comma. Not clear to me how that was introduced. Best,
Keston
…On Wed, Nov 5, 2025 at 3:03 PM mingchen-NOAA ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In model/src/w3gridmd.F90
<#1524 (comment)>:
> @@ -6342,7 +6354,7 @@ SUBROUTINE W3GRID()
2922 FORMAT ( ' &SNL1 LAMBDA =',F7.3,', NLPROP =',E10.3, &
', KDCONV =',F7.3,', KDMIN =',F7.3,','/ &
' SNLCS1 =',F7.3,', SNLCS2 =',F7.3, &
- ', SNLCS3 = ',F7.3,','/ &
+ ', SNLCS3 = ',F7.3','/ &
@kestonsmith-noaa <https://github.com/kestonsmith-noaa> still missing a
comma in your code.
—
Reply to this email directly, view it on GitHub
<#1524 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AZUY35EPZ5CRWNQELASBJNT33JJ2HAVCNFSM6AAAAACLHWXPXWVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZTIMRUGA4TGOJWGE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
|
Great! Your code looks good to me. I will start regression tests and matrix comparisons. |
|
Thanks @mingchen-NOAA and thank you @kestonsmith-noaa !! |
|
And of course Thank you Jessica!
…On Wed, Nov 5, 2025 at 3:33 PM Jessica Meixner ***@***.***> wrote:
*JessicaMeixner-NOAA* left a comment (NOAA-EMC/WW3#1524)
<#1524 (comment)>
Thanks @mingchen-NOAA <https://github.com/mingchen-NOAA> and thank you
@kestonsmith-noaa <https://github.com/kestonsmith-noaa> !!
—
Reply to this email directly, view it on GitHub
<#1524 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AZUY35B4ZLWWHARBUTT4EU333JNIPAVCNFSM6AAAAACLHWXPXWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTIOJTGI3DEMRWHA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
|
@kestonsmith-noaa No error was observed on Ursa Intel but I got some errors on Ursa GNU: The issue originates from the lines in w3gridmd.F90, where the write format was defined. In that section, the line |
|
OK, Thanks for finding this. I will update shortly. Best, Keston
…On Wed, Nov 5, 2025 at 9:55 PM mingchen-NOAA ***@***.***> wrote:
*mingchen-NOAA* left a comment (NOAA-EMC/WW3#1524)
<#1524 (comment)>
@kestonsmith-noaa <https://github.com/kestonsmith-noaa> No error was
observed on Ursa Intel but *I got some errors on Ursa GNU*:
see matrix13.out in
/scratch4/NCEPDEV/marine/Ming.Chen/ww3/ursa/ww3_pr/pr_1524_gnu/regtests
+--------------------+
| Grid preprocessor |
+--------------------+
Processing /scratch4/NCEPDEV/marine/Ming.Chen/ww3/ursa/ww3_pr/pr_1524_gnu/regtests/ww3_ufs1.1/input_unstr/ww3_grid_a.inp
Screen output routed to /scratch4/NCEPDEV/marine/Ming.Chen/ww3/ursa/ww3_pr/pr_1524_gnu/regtests/ww3_ufs1.1/work_unstr_a/ww3_grid_a.out
At line 3378 of file /scratch4/NCEPDEV/marine/Ming.Chen/ww3/ursa/ww3_pr/pr_1524_gnu/model/src/w3gridmd.F90 (unit = 6, file = 'stdout')
Fortran runtime error: Expected INTEGER for item 27 in formatted transfer, got LOGICAL
( ' &UNST UGBCCFL =',L3,', UGOBCAUTO =',L3, ', UGOBCDEPTH =', F8.3,
^
Error termination. Backtrace:
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
#0 0x149f5804e72f in ???
#1 0x149f5827681e in x86_64_fallback_frame_state
at ./md-unwind-support.h:63
#2 0x149f5827681e in uw_frame_state_for
at /tmp/role.apps/spack-stage/spack-stage-gcc-12.4.0-dsgnou52lpn2tus6mohdmcw5mjqmqrhj/spack-src/libgcc/unwind-dw2.c:1271
#3 0x149f5827857a in _Unwind_Backtrace
at /tmp/role.apps/spack-stage/spack-stage-gcc-12.4.0-dsgnou52lpn2tus6mohdmcw5mjqmqrhj/spack-src/libgcc/unwind.inc:303
#4 0x149f589957bd in backtrace_full
at /tmp/role.apps/spack-stage/spack-stage-gcc-12.4.0-dsgnou52lpn2tus6mohdmcw5mjqmqrhj/spack-src/libbacktrace/backtrace.c:127
./bin/run_cmake_test: line 609: 4115019 Segmentation fault $path_e/$prog > $ofile
ERROR: Error occured during /scratch4/NCEPDEV/marine/Ming.Chen/ww3/ursa/ww3_pr/pr_1524_gnu/regtests/ww3_ufs1.1/work_unstr_a/exe/ww3_grid execution
The issue originates from the lines
<https://github.com/kestonsmith-noaa/WW3/blob/ImplTimeStepTruncate/model/src/w3gridmd.F90#L6710-L6732>
in w3gridmd.F90, where the write format was defined. In that section, the
line
JGS_TRUNK_DIGITS=', I3,
was added under the #ifdef W3_TRNK condition.
However, in the corresponding WRITE statement at lines
<https://github.com/kestonsmith-noaa/WW3/blob/ImplTimeStepTruncate/model/src/w3gridmd.F90#L3362-L3378>,
the variable JGS_TRUNK_DIGITS was not included when W3_TRNK is defined.
This mismatch between the format descriptors and the actual variable list
causes the runtime error:
Expected INTEGER for item 27 in formatted transfer, got LOGICAL
The Intel compiler tolerates this mismatch, but GNU Fortran enforces
stricter type checking during formatted I/O operations, which is why the
error only appears when compiled with GNU.
—
Reply to this email directly, view it on GitHub
<#1524 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AZUY35FJS4ANS6LJ6HHBQRL33K2CZAVCNFSM6AAAAACLHWXPXWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTIOJUGYZDSMJYGE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
|
The regression tests show no error for both Intel and GNU. Now running matrix comparisons. |
|
Ursa Intel matrix comparison results. Differences due to this PR NOTE: differs in ww3_ufs1.1/./work_unstr_a and ww3_ufs1.1/./work_unstr_b come from |
|
@kestonsmith-noaa Just to confirm - are the changes in |
…_unstr/ww3_grid_c.inp
… in Domain Decomposition Implicit, can be used to force bit-for-bit reproducability.
|
Thank you @kestonsmith-noaa ! Testing the updates. |
|
@kestonsmith-noaa matrix comparisons have completed in both Intel and GNU. The change in ./work_unstr_c is not expected because a new case d is used. Do you have any idea of it? |
|
regression tests and matrix comparisons are done. |
|
@kestonsmith-noaa Differ comes from some ww3_point.outs. your PR added JGS_TRUNK_DIGITS= 5 in all ww3_points.out file. And so many differs I did not investigate. |
|
@kestonsmith-noaa Are these differences as you expected? |
|
Small differences are expected in ww3_ufs1.1/work_unstr_c due to the
introduction of communication before the iterative solver and of
course ww3_ufs1.1/work_unstr_d is a new test. From what I can see, the
other differences are all due to the inclusion of JGS_TRUNK_DIGITS value in
output. A simple change to w3gridmd.F90 could suppress this write
statement if TRNK is not present, and then the only effect would be
ww3_ufs1.1/work_unstr_d/ - Should I make the change to suppress
writing of JGS_TRUNK_DIGITS when TRNK is not present? Thanks, Keston
…On Sat, Nov 8, 2025 at 10:01 AM mingchen-NOAA ***@***.***> wrote:
*mingchen-NOAA* left a comment (NOAA-EMC/WW3#1524)
<#1524 (comment)>
@kestonsmith-noaa <https://github.com/kestonsmith-noaa> Are these
differences as you expected?
Could you do a full regression tests and matrix comparisons first? If you
agree with the results and comparison outputs, update the
matrixCompFull.txt and matrixCompSummary.txt to the PR Testing section.
Then you can pass the code to me for a test. Thank you!
—
Reply to this email directly, view it on GitHub
<#1524 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AZUY35AGQOWRYCXH6Z7TIPD33YAV7AVCNFSM6AAAAACLHWXPXWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTKMBWGYYTMMZUG4>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
|
hercules intel passed |
|
UFSWM RT tests: Both of them are not using wave model. |
|
Hercules GNU: |
Pull Request Summary
A switch is added to allow precision truncation at the end of each timestep. This can facilitate solution reproducibility when using implicit time stepping on unstructured meshes.
Description
A precision of truncation at the end of PDLIB_JACOBI_GAUSS_SEIDEL_BLOCK can be activated with switch TRNK in the event that solutions do not reproduce. This truncation of precision prevents small differences in mirrored solutions from developing facilitating solution reproducibility across different numbers of MPI tasks. Changes are not expected in the results of regtests, however a new test is introduced, ww3_ufs1.1 grid_d to test precision truncation.
Issue(s) addressed
This change addresses issues raised in:
/issues/322
Commit Message
Addition of switch to activate precision truncation in PDLIB_JACOBI_GAUSS_SEIDEL_BLOCK to address reproducibility issues.
Check list
Testing
matrixCompSummary.txt