Skip to content

PCU: Add nompi stub #425

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 69 commits into from
Mar 12, 2025
Merged

PCU: Add nompi stub #425

merged 69 commits into from
Mar 12, 2025

Conversation

bobpaw
Copy link
Collaborator

@bobpaw bobpaw commented Apr 26, 2024

Add nompi stub

Add a stub implementation of MPI in pcu/pcu_pnompi.c.
Remove direct calls to MPI functions.

This adds a SCOREC_NO_MPI build option for users running on single-process, single-machine setups that do not want to compile with MPI. Message passing is simulated in memory.

@bobpaw
Copy link
Collaborator Author

bobpaw commented Apr 26, 2024

@cwsmith, build is failing for this PR. I think we should add PCU_Init and PCU_Finalize facades that wrap the MPI calls and do a PCU_Comm_Init() and PCU_Comm_Free() to remove the direct dependencies. There are a lot of executables that would need to be changed and this would become a large PR. Alternatively, we could define a replacement MPI_Init and MPI_Finalize in pcu_pnompi and link to those. Or, we could do a hybrid approach and encourage people to use PCU_Init in the future.

@jacobmerson
Copy link
Contributor

@flagdanger and I are working on a very large update of PCU that should make this all easier by getting rid of the singleton. Pull request for that coming within the next week.

@jacobmerson
Copy link
Contributor

@bobpaw sorting out a few things to make the pull request ready to merge, but #388 the draft pull request for replacing the singleton in PCU.

Although we replaced the interface used in the test cases, this pull request is designed to work with no modification to existing codes by using a global PCU state when the old style interface is used with PCU_Comm_Init or it can use local state.

@bobpaw
Copy link
Collaborator Author

bobpaw commented Apr 30, 2024

@jacobmerson, we will plan to use your PR for the future of core, but the sponsor wants this functionality soon, so we will use this branch until yours is finished.

@cwsmith
Copy link
Contributor

cwsmith commented Apr 30, 2024

@bobpaw Does PCU_Comm_Init() check if MPI was initialized?

@bobpaw
Copy link
Collaborator Author

bobpaw commented Apr 30, 2024

@bobpaw Does PCU_Comm_Init() check if MPI was initialized?

No, at the moment it only checks if PCU has been initialized using a global state variable. We could check with MPI_Initialized.

@onkarsahni
Copy link
Contributor

@cwsmith Note Aiden is only doing this when SCOREC_NO_MPI is defined to be true, so there is no actual MPI (e.g., see pcu/pcu_pnompi.c). For now, he is not looking into PCU_init, that will be done after the other PR #388.

@cwsmith
Copy link
Contributor

cwsmith commented Apr 30, 2024

@bobpaw OK. I'm wondering if it will make sense to make the MPI check in that call, and if it fails call MPI_Init. But... handling MPI_Finalize could be a bit more tricky; other libraries may still be running and using MPI beyond the lifetime of PCU. I'd prefer symmetry in the interface so your original suggestion may be best.

@bobpaw bobpaw marked this pull request as ready for review May 6, 2024 15:13
Added pcu_pnompi.c.
Added SCOREC_NO_MPI CMake option.
Removed direct MPI calls in parma, phasta, and PUMI.
Added incomplete reimplementations of those MPI calls.
Add stubs for MPI_Init and MPI_Finalize.
Add pcu_pnompi_types.h for MPI typedefs.
Add status message about SCOREC_NO_MPI flag.

Signed-off-by: Aiden Woodruff <woodra@rpi.edu>
Signed-off-by: Aiden Woodruff <woodra@rpi.edu>
Replace MPI_Comm_free with PCU_Comm_Free_One.

Signed-off-by: Aiden Woodruff <woodra@rpi.edu>
Signed-off-by: Aiden Woodruff <woodra@rpi.edu>
Signed-off-by: Aiden Woodruff <woodra@rpi.edu>
This will avoid object collisions when linking to libraries that already use MPI.
- i.e. the intended use case for SCOREC_NO_MPI.

Signed-off-by: Aiden Woodruff <woodra@rpi.edu>
Changed add_nompi_msg to record receiver.
- Since there's no MPI, sender == receiver but I want to quiet the warnings.

Signed-off-by: Aiden Woodruff <woodra@rpi.edu>
Add return type for pcu_pmpi_free, pcu_pmpi_split.

Signed-off-by: Aiden Woodruff <woodra@rpi.edu>
Add linked list traversal to find a matching message.
Removed error message that did not actually indicate errors.
- If no message is found it's perfectly reasonable and in fact correct to
  return 0 and that's how pmpi knows to return false.

Signed-off-by: Aiden Woodruff <woodra@rpi.edu>
Run smoke tests without mpirun given SCOREC_NO_MPI.
Add SCOREC_NO_MPI to the GitHub Actions test matrix.

Signed-off-by: Aiden Woodruff <woodra@rpi.edu>
Signed-off-by: Aiden Woodruff <woodra@rpi.edu>
I use clock_gettime, which is one of the many methods mpich uses depending on
the system.

Signed-off-by: Aiden Woodruff <woodra@rpi.edu>
Also update TriBITS package file.

Signed-off-by: Aiden Woodruff <woodra@rpi.edu>
- Add mpi_test_depends function to replace set_test_properties. This
  function checks if SCOREC_NO_MPI is defined and assumes nonexistent
  tests were multiproc ones.

Signed-off-by: Aiden Woodruff <woodra@rpi.edu>
Signed-off-by: Aiden Woodruff <woodra@rpi.edu>
- parma/group/parma_group.cc(runInGroups): remove PCU/MPI free that will
  be carried out by pcu::PCU::~PCU.
- test/xgc_split.cc: remove direct mpi.h inclusion
- test/pumiLoadMesh.cc: remove direct mpi.h inclusion.
- test/modelInfo.cc: remove direct #include <mpi.h>.
- phasta/phstream.cc(::getTime): use new pcu::Time.
- test/fieldReduce.cc: use correct PCUObj.Peers.
- pumi/pumi_sys.cc(pumi_sync): use PCU Barrier instead of MPI Barrier.
- pcu/PCU.h: make PCU_Comm (holdover) functions extern C and PCU_Wtime.
- pumi/pumi_ghost.cc(do_off_part_bridge): fix typo (forgot parens).
- pumi/pumi_mesh.cc: capitalize GetMPIComm.
- pumi/pumi_numbering(pumi_numbering_print): use pcu::PCU::Barrier
  instead of WORLD barrier.
- pcu/PCU_C.h: optionally include mpi or stub.
- pcu/pcu_c.cc: add split stub and include pcu_pmpi.h.
- pcu/pcu_pmpi.c: remove leftover conflict markers.
- pcu/pcu_pnompi.h: update function signatures, make extern C, and
  remove obsolete stubs.
- pcu/pcu_pnompi.c: remove globals and use pcu_mpi_t*.
- clean up formatting.
- remove unused or (void)ed argument names.

Signed-off-by: Aiden Woodruff <woodra@rpi.edu>
- pcu/PCU.cc(PCU::~PCU): clean up message order if it exists (valgrind
  is happy now).

Signed-off-by: Aiden Woodruff <woodra@rpi.edu>
- pcu/pcu_pmpi.c: remove pcu_pmpi_switch, pcu_pmpi_comm which were used
  by old nompi stubs.
- (pcu_pmpi_allgather, pcu_pmpi_allreduce): break long lines and clean
  up indentation.

Signed-off-by: Aiden Woodruff <woodra@rpi.edu>
bobpaw added 2 commits March 3, 2025 18:39
- pcu/PCU.h (pcu::PCU_Init): change return type to void.
- remove @return clause.
- (pcu::PCU_Finalize): change return type to void.
- remove @return clause.
- pcu/PCU.cc (pcu::PCU_Init): only call MPI_Init if not previously
  initialized.
- (pcu::PCU_Finalize): only finalize if not previously finalized.

Signed-off-by: Aiden Woodruff <woodra@rpi.edu>
- pcu/PCU.h: rename PCU_Init to Init and PCU_Finalize to Finalize.
- pcu/PCU.cc: rename functions.
- pcu/PCU_C.h: add PCU_Init and PCU_Finalize to call pcu::Init and
  pcu::Finalize.
- pcu/pcu_c.cc: add PCU_Init and PCU_Finalize definitions.
- test/*.cc phasta/*.cc: replace PCU_Init with Init and PCU_Finalize
  with Finalize.

Signed-off-by: Aiden Woodruff <woodra@rpi.edu>
@cwsmith
Copy link
Contributor

cwsmith commented Mar 4, 2025

Is there anything we want to bundle with this major release?

Not that I can think of.

@cwsmith
Copy link
Contributor

cwsmith commented Mar 4, 2025

/runtests

@cwsmith
Copy link
Contributor

cwsmith commented Mar 4, 2025

@bobpaw Once the pumi version is increased we should be good to go. Thanks for all the work on this.

Signed-off-by: Aiden Woodruff <woodra@rpi.edu>
Copy link

github-actions bot commented Mar 4, 2025

Build Log
Simmetrix Test Result: failure
Simmetrix + CGNS Test Result: skipped

@cwsmith
Copy link
Contributor

cwsmith commented Mar 4, 2025

@bobpaw The CI tests caught what I'm guessing is a missing header?

 /space/smithc11/githubSelfHostedRunner/actions-runner/_work/core/core/core_425/test/sim_countBL.cc: In function 'int main(int, char**)':
/space/smithc11/githubSelfHostedRunner/actions-runner/_work/core/core/core_425/test/sim_countBL.cc:25:3: error: 'pcu' has not been declared
   25 |   pcu::Init(&argc, &argv);
      |   ^~~
/space/smithc11/githubSelfHostedRunner/actions-runner/_work/core/core/core_425/test/sim_countBL.cc:110:3: error: 'pcu' has not been declared
  110 |   pcu::Finalize();
      |   ^~~

@bobpaw
Copy link
Collaborator Author

bobpaw commented Mar 4, 2025

I think I had to edit that file in the deltaWing example. That file is also using MPI allreduce, peers, etc. so I will need to change a few other lines.

bobpaw added 2 commits March 4, 2025 12:25
Signed-off-by: Aiden Woodruff <woodra@rpi.edu>
Signed-off-by: Aiden Woodruff <woodra@rpi.edu>
@cwsmith
Copy link
Contributor

cwsmith commented Mar 4, 2025

/runtests

Copy link

github-actions bot commented Mar 4, 2025

Build Log
Simmetrix Test Result: success
Simmetrix + CGNS Test Result: failure

- pcu/PCU.h: make Allgather `send` a const void*.
- pcu/PCU.cc: update instantiations.
- pcu/pcu_coll.h (pcu_allgather): make `send_data` const.
- pcu/pcu_coll.c: update pcu_gather and pcu_allgather arguments.

Signed-off-by: Aiden Woodruff <woodra@rpi.edu>
Copy link
Contributor

@cwsmith cwsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whooops. I misread the CI test results. We need to fix the cgns tests.

- apf/apfCGNS.cc: replace MPI_Allgather with PCU::Allgather.
- wrap Allgatherv call in #ifndef SCOREC_NO_MPI. use DupComm and use
  std::copy in the serial case.
- use DupComm for cgp_mpi_comm. and free the comm later with
  MPI_Comm_free wrapped in #ifndef SCOREC_NO_MPI.
- mds/mdsCGNS.cc: use PCU_Comm_Dup and MPI_Comm_free later instead of
  using PCU_Get_Comm.

Signed-off-by: Aiden Woodruff <woodra@rpi.edu>
@cwsmith
Copy link
Contributor

cwsmith commented Mar 7, 2025

/runtests

Copy link

github-actions bot commented Mar 7, 2025

Build Log
Simmetrix Test Result: success
Simmetrix + CGNS Test Result: failure

bobpaw added 3 commits March 11, 2025 16:47
Signed-off-by: Aiden Woodruff <woodra@rpi.edu>
- since this is an MPI-only application, it should include MPI directly
  and not via PCU.

Signed-off-by: Aiden Woodruff <woodra@rpi.edu>
- pcu/pcu_coll.c (pcu_merge_gather): fix buffer overflow on the local
  array when number of peers is odd. I was assuming that with power of 2
  peers the local array always has an even size. Now, memcpy will never
  write past local + size.
- added comment explaining it.

Signed-off-by: Aiden Woodruff <woodra@rpi.edu>
@cwsmith
Copy link
Contributor

cwsmith commented Mar 12, 2025

/runtests

Copy link

Build Log
Simmetrix Test Result: success
Simmetrix + CGNS Test Result: success

@bobpaw
Copy link
Collaborator Author

bobpaw commented Mar 12, 2025

@cwsmith would you do the honors?

@cwsmith cwsmith merged commit c41573d into develop Mar 12, 2025
8 checks passed
@cwsmith cwsmith deleted the apw/nompi branch March 12, 2025 01:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants