coll/acoll: Bcast/Barrier enhancements and bug fixes. #13222
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR introduces enhancements to Bcast and Barrier algorithms in acoll. Additionally, some bugs are fixed and code robustness is improved.
Enhancements:
-Shared memory-based implementations of MPI_Bcast and MPI_Barrier for intra-node communication are added. Performance improvements are observed till 8KB when using acoll bcast based on shared memory.
The following performance trends were observed when benchmarked on AMD EPYC 9755 128-Core Processor with 128 ranks:
-Two new mca parameters are introduced:
1. mca_coll_acoll_disable_shmbcast to enable/disable shared memory algorithm in bcast.
2. mca_coll_acoll_barrier_algo to select between hierarchical and flat shared memory barrier algorithm.
Others:
-A few coverity scan warnings in acoll are fixed.
-Added checks to ensure xpmem-based and shared memory-bsaed optimizations are disabled for non predefined data types or accelerator buffers.
-Improved error handling in various collectives in acoll.
-Simplified logic in coll_acoll_alltoall.c for selecting communication groups.