Skip to content

Conversation

@JessicaMeixner-NOAA
Copy link
Collaborator

Pull Request Summary

Updates code from using include mpif.h statements to using mpif08

Description

To avoid compile warnings and update MPI usage, use updated mpif08 and no longer use include mpi.h statements

This work was done by @NickSzapiro-NOAA w/a few minor updates from me. Thank you Nick!!

Please also include the following information:

  • Add any suggestions for a reviewer:
  • Mention any labels that should be added: enhancement
  • Are answer changes expected from this PR? No!

Issue(s) addressed

Note a follow up PR with additional updates for dev/ufs-weather-model branch is coming soon.

Commit Message

Update include mpif.h to use mpif08

Check list

Testing

  • How were these changes tested? matrix
  • Are the changes covered by regression tests? (If not, why? Do new tests need to be added?) yes
  • Have the matrix regression tests been run (if yes, please note HPC and compiler)? ursa intel
  • Please indicate the expected changes in the regression test output, (Note the list of known non-identical tests.) None!
  • Please provide the summary output of matrix.comp (matrix.Diff.txt, matrixCompFull.txt and matrixCompSummary.txt):
**********************************************************************
********************* non-identical cases ****************************
**********************************************************************
mww3_test_03/./work_PR3_UQ_MPI_e                     (1 files differ)
mww3_test_03/./work_PR3_UQ_MPI_d2_c                     (15 files differ)
mww3_test_03/./work_PR1_MPI_d2                     (19 files differ)
mww3_test_03/./work_PR2_UNO_MPI_d2                     (18 files differ)
mww3_test_03/./work_PR2_UNO_MPI_e                     (1 files differ)
mww3_test_03/./work_PR3_UQ_MPI_d2                     (16 files differ)
mww3_test_03/./work_PR3_UQ_MPI_e_c                     (1 files differ)
mww3_test_03/./work_PR1_MPI_e                     (1 files differ)
mww3_test_03/./work_PR2_UQ_MPI_e                     (1 files differ)
mww3_test_03/./work_PR2_UQ_MPI_d2                     (17 files differ)
mww3_test_03/./work_PR3_UNO_MPI_d2_c                     (14 files differ)
mww3_test_03/./work_PR3_UNO_MPI_e                     (1 files differ)
mww3_test_03/./work_PR3_UNO_MPI_d2                     (17 files differ)
mww3_test_03/./work_PR3_UNO_MPI_e_c                     (1 files differ)
mww3_test_09/./work_MPI_ASCII                     (0 files differ)
ww3_tp2.10/./work_MPI_OMPH                     (6 files differ)
ww3_tp2.6/./work_ST4_ASCII                     (0 files differ)
ww3_ufs1.3/./work_a                     (3 files differ)

matrixCompFull.txt
matrixCompSummary.txt
matrixDiff.txt

NickSzapiro-NOAA and others added 30 commits July 23, 2025 19:02
UFS Intel RTs pass bit-for-bit. GNU fails to compile
More fixes to develop for use_mpif08
@JessicaMeixner-NOAA
Copy link
Collaborator Author

@mickaelaccensi - Can you share more info about your environment? @NickSzapiro-NOAA found this: pmodels/mpich#6423 and is curious if it might be related?

@ukmo-kitstokes
Copy link
Collaborator

ukmo-kitstokes commented Oct 8, 2025

@ukmo-kitstokes What are these difference files ww3_tp2.10/./work_MPI (1 files differ) ww3_tp2.11/./work_MPI (2 files differ)? log or binary

The first one is mod_def.ww3 (binary) and the second two are ww3.196806.nc (binary) and ww3_ounf.out.

I'm running another set of regtests on develop so I can see which files are in the known set of non-b4b results on our machine, but if you can see that from Juan's previous PRs then let me know.

@JessicaMeixner-NOAA
Copy link
Collaborator Author

@ukmo-kitstokes apologies for not following up. From Juan's previous PRs it doesn't look like that's a known nonb4b. That being said, that the mod_def is different is strange --- how big are the diffs in ww3.196806.nc ?

In your regtest directory, look at output/ww3_ufs1.3/ww3_tp2.10//work_MPI there should be diff files from ncdumps of the netcdf files that you can look at.

@ukmo-kitstokes
Copy link
Collaborator

ukmo-kitstokes commented Oct 9, 2025

Hi Jessica, I can't seem to find anything specifically referring to ww3_tp2.10//work_MPI in the matrixDiff.txt-ww3_tp2.10 file, not sure why. I've attached the diff files that are output as I'm not exactly sure what to look for.
matrixDiff.zip

@ukmo-kitstokes
Copy link
Collaborator

I've run the regtests again on develop to double check what the known b4b differences should be on our machine, which are:


********************* non-identical cases ****************************


mww3_test_03/./work_PR3_UNO_MPI_d2 (15 files differ)
mww3_test_03/./work_PR1_MPI_d2 (12 files differ)
mww3_test_03/./work_PR2_UQ_MPI_d2 (15 files differ)
mww3_test_03/./work_PR3_UQ_MPI_d2_c (16 files differ)
mww3_test_03/./work_PR2_UNO_MPI_d2 (12 files differ)
mww3_test_03/./work_PR3_UQ_MPI_d2 (15 files differ)
mww3_test_03/./work_PR2_UQ_MPI_e (1 files differ)
mww3_test_03/./work_PR3_UNO_MPI_d2_c (12 files differ)
ww3_tp2.10/./work_MPI_OMPH (6 files differ)

This list does look a bit different from the one I posted above. Quite a few mww3 tests have different number of files differing and the tp2 list is different too. I want to make sure I've compared your branch to the correct revision of develop - I just synced our fork and did a git pull on develop to get the latest version when I did the regtests, so this could have some differences I guess?

@ukmo-kitstokes
Copy link
Collaborator

Actually, looking again at your email where you specified the commit you based your branch on, I am comparing with the correct revision of develop ('Update grib2 outputs to have complex packing (#1496)')

@mickaelaccensi
Copy link
Collaborator

I've tried with

export MPIR_CVAR_IALLGATHER_INTRA_ALGORITHM=sched_ring
export MPIR_CVAR_IALLGATHER_INTRA_ALGORITHM=sched_brucks
export MPIR_CVAR_IALLGATHER_INTRA_ALGORITHM=tsp_ring
export MPIR_CVAR_IALLGATHER_INTRA_ALGORITHM=tsp_brucks
export MPIR_CVAR_IALLGATHER_INTRA_ALGORITHM=tsp_recexch_doubling
export MPIR_CVAR_IALLGATHER_INTRA_ALGORITHM=tsp_recexch_halving

CMAKE_OPTIONS='-DCMAKE_BUILD_TYPE=Debug'

but not changes or more verbose log.

I've requested a newer version of mpi to check if it can solve the issue. In the meantime I'll try with scotch librairy compiled with mpt instead of mpi.

If you have any other thoughts ?

@JessicaMeixner-NOAA
Copy link
Collaborator Author

@mickaelaccensi are you using -g -traceback in the compile options? I'm assuming you are. I was hoping a line number would come.

@NickSzapiro-NOAA do you have any suggestions ?

@NickSzapiro-NOAA
Copy link
Contributor

It's hard without a traceback ... but there aren't so many mpi_allgather calls.

This line could be suspicious:

call mpi_allgather(np_perProcSum(myrank)+1, 1, itype, vtxdist, 1, itype, comm, ierr)

as in the realm of temporary array copies and MPI_INTEGER .ne. INTEGER

@NickSzapiro-NOAA
Copy link
Contributor

I'm also not sure what's happening from the indexing mismatch between allocate(np_perProcSum(0:nTasks), stat=stat) and allocate(vtxdist(nTasks+1),stat=stat)

@NickSzapiro-NOAA
Copy link
Contributor

If this is it, one suggestion is to send from np_perProcSum, receive in vxdist, then increment by 1

@mickaelaccensi
Copy link
Collaborator

I'm using -g -traceback
If you have any tests or code changes, let me know and I can test it

@mickaelaccensi
Copy link
Collaborator

how do you force the -DCMAKE_BUILD_TYPE=Debug option and -O0 option when running run_cmake_test ?

@JessicaMeixner-NOAA
Copy link
Collaborator Author

To add in things to cmake build options like debug I usually just add in a line defining CMAKE_OPTIONS which is used in the cmake calls, for example here: https://github.com/NOAA-EMC/WW3/blob/develop/regtests/bin/run_cmake_test#L447C23-L447C36

You can also hard code things here: https://github.com/NOAA-EMC/WW3/blob/develop/model/src/CMakeLists.txt#L49-L92 depending on which compiler you are using.

@mickaelaccensi
Copy link
Collaborator

If this is it, one suggestion is to send from np_perProcSum, receive in vxdist, then increment by 1

can you suggest the related code that I can test ?

@NickSzapiro-NOAA
Copy link
Contributor

These are untested, but instead of call mpi_allgather(np_perProcSum(myrank)+1, 1, itype, vtxdist, 1, itype, comm, ierr)

option 1:

call mpi_allgather(np_perProcSum(myrank), 1, itype, vtxdist, 1, itype, comm, ierr)
vtxdist(:) = vtxdist(:)+1

option 2:

INTEGER :: np_toSend
...
np_toSend = np_perProcSum(myrank)+1
call mpi_allgather(np_toSend, 1, itype, vtxdist, 1, itype, comm, ierr)

@mickaelaccensi
Copy link
Collaborator

both solutions work and provide the same results as the develop branch !
so I let integrate the option you prefer in the branch and I'll rerun the full matrix

@ukmo-kitstokes
Copy link
Collaborator

@JessicaMeixner-NOAA - let me know if you want me to re-run regtests on our machine once the fix @NickSzapiro-NOAA suggested above has been put into the branch

@JessicaMeixner-NOAA
Copy link
Collaborator Author

These are untested, but instead of call mpi_allgather(np_perProcSum(myrank)+1, 1, itype, vtxdist, 1, itype, comm, ierr)

option 1:

call mpi_allgather(np_perProcSum(myrank), 1, itype, vtxdist, 1, itype, comm, ierr)
vtxdist(:) = vtxdist(:)+1

option 2:

INTEGER :: np_toSend
...
np_toSend = np_perProcSum(myrank)+1
call mpi_allgather(np_toSend, 1, itype, vtxdist, 1, itype, comm, ierr)

@NickSzapiro-NOAA - Is there a preferred solution here since @mickaelaccensi said both work.

@JessicaMeixner-NOAA
Copy link
Collaborator Author

@mickaelaccensi @ukmo-kitstokes - Apologies for the delay. I have implemented option 2. As an extra sanity check I will run one more round of regression tests over night. Assuming those will go well, we will merge this tomorrow.

@ukmo-kitstokes I don't think it's necessary that you re-run things on your end, but you are of course welcome to. Just let me know if you have any objections to merging - and we'll wait for those issues to be resolved if you do.

@ukmo-kitstokes
Copy link
Collaborator

I'm re-running the regtests now and will let you know if anything has changed.

@mingchen-NOAA
Copy link
Collaborator

I re-ran the tests with the latest update on Hercules intel. All passed.

**********************************************************************
********************* non-identical cases ****************************
**********************************************************************
mww3_test_03/./work_PR1_MPI_d2                     (14 files differ)
mww3_test_03/./work_PR3_UNO_MPI_e_c                     (1 files differ)
mww3_test_03/./work_PR3_UQ_MPI_d2                     (16 files differ)
mww3_test_03/./work_PR2_UNO_MPI_d2                     (17 files differ)
mww3_test_03/./work_PR3_UNO_MPI_d2                     (19 files differ)
mww3_test_03/./work_PR2_UQ_MPI_d2                     (16 files differ)
mww3_test_03/./work_PR3_UQ_MPI_e                     (1 files differ)
mww3_test_03/./work_PR3_UQ_MPI_d2_c                     (17 files differ)
mww3_test_03/./work_PR2_UNO_MPI_e                     (1 files differ)
mww3_test_03/./work_PR3_UQ_MPI_e_c                     (1 files differ)
mww3_test_03/./work_PR3_UNO_MPI_e                     (1 files differ)
mww3_test_03/./work_PR2_UQ_MPI_e                     (1 files differ)
mww3_test_03/./work_PR1_MPI_e                     (1 files differ)
mww3_test_03/./work_PR3_UNO_MPI_d2_c                     (12 files differ)
mww3_test_09/./work_MPI_ASCII                     (0 files differ)
ww3_tp2.10/./work_MPI_OMPH                     (7 files differ)
ww3_tp2.16/./work_MPI_OMPH                     (4 files differ)
ww3_tp2.6/./work_ST4_ASCII                     (0 files differ)
ww3_ufs1.3/./work_a                     (3 files differ) 

matrixCompFull.txt
matrixCompSummary.txt
matrixDiff.txt

@JessicaMeixner-NOAA
Copy link
Collaborator Author

@mickaelaccensi @ukmo-kitstokes - We'd like to get this merged today, please let me know if you have any objections.

@mickaelaccensi
Copy link
Collaborator

go for it, I have not the regtests but no reasons it crashes now

@mingchen-NOAA
Copy link
Collaborator

I'm re-running the regtests now and will let you know if anything has changed.

@ukmo-kitstokes how are your regtests going? Please let us know if you still see any issues.

@ukmo-kitstokes
Copy link
Collaborator

For some reason I got a load of Fatal error in make errors which I didn't get when I tested the original branch a few days earlier. I'll have to investigate and re-run the regtests. Perhaps best if you go ahead though, I don't want to hold you up with merging. I'll update this feed once I've got regtests completed again.

@mingchen-NOAA mingchen-NOAA merged commit 238cd41 into NOAA-EMC:develop Oct 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Update include mpif.h to use_mpif08

5 participants