Skip to content

coll/acoll: Bcast/Barrier enhancements and bug fixes. #13222

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

MithunMohanKadavil
Copy link
Contributor

This PR introduces enhancements to Bcast and Barrier algorithms in acoll. Additionally, some bugs are fixed and code robustness is improved.

Enhancements:
-Shared memory-based implementations of MPI_Bcast and MPI_Barrier for intra-node communication are added. Performance improvements are observed till 8KB when using acoll bcast based on shared memory.
The following performance trends were observed when benchmarked on AMD EPYC 9755 128-Core Processor with 128 ranks:

image

image

-Two new mca parameters are introduced:
1. mca_coll_acoll_disable_shmbcast to enable/disable shared memory algorithm in bcast.
2. mca_coll_acoll_barrier_algo to select between hierarchical and flat shared memory barrier algorithm.

Others:
-A few coverity scan warnings in acoll are fixed.
-Added checks to ensure xpmem-based and shared memory-bsaed optimizations are disabled for non predefined data types or accelerator buffers.
-Improved error handling in various collectives in acoll.
-Simplified logic in coll_acoll_alltoall.c for selecting communication groups.

Shared memory based implementations for bcast and barrier are enabled
for within a node communication.

Signed-off-by: Nithya V S <Nithya.VS@amd.com>
Code changes to remove a few coverity scan warnings, fallbacks to base
algorithms for custom datatypes.

Signed-off-by: Nithya V S <Nithya.VS@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants