Skip to content

Conversation

@yfguo
Copy link
Contributor

@yfguo yfguo commented Jul 17, 2025

Pull Request Description

This is the first PR for the collective selection refactoring. This PR focus on providing better way to print the tree and summary of the loaded collective selection logic.

This PR merges the collective algorithm enums for MPIR, CH4, POSIX and OFI to once place. This allows us to print algorithm names using the string constants added in this PR. This change enables us to use the existing tree printing function to output meaningful info about the loaded tree.

This PR creates a new function to print the collective selection summarization. This format organizes the conditions by individual algorithm (see examples in the next comment). Both tree and summary report format are controlled by a new CVAR MPIR_CVAR_COLLECTIVE_SELECTION_REPORT={none, tree, summary, all}.

Author Checklist

  • Provide Description
    Particularly focus on why, not what. Reference background, issues, test failures, xfail entries, etc.
  • Commits Follow Good Practice
    Commits are self-contained and do not do two things at once.
    Commit message is of the form: module: short description
    Commit message explains what's in the commit.
  • Passes All Tests
    Whitespace checker. Warnings test. Additional tests via comments.
  • Contribution Agreement
    For non-Argonne authors, check contribution agreement.
    If necessary, request an explicit comment from your companies PR approval manager.

@yfguo
Copy link
Contributor Author

yfguo commented Jul 17, 2025

Here is an example of the output.

Loaded Collective Selection Tree from /Users/yguo/tuning/MPIR_Coll_tuning.json:
====================================
Processed Collective Selection Tree:
collective: bcast
  intra_comm
    comm_size <= 1
      avg_msg_size <= 0
        Algorithm: MPIR_Bcast_intra_binomial

      avg_msg_size <= 128
        Algorithm: MPIR_Bcast_intra_tree

      avg_msg_size <= 8192
        Algorithm: MPIR_Bcast_intra_tree

      avg_msg_size <= 32768
        Algorithm: MPIR_Bcast_intra_tree

      avg_msg_size <= 262144
        Algorithm: MPIR_Bcast_intra_pipelined_tree

      any
        Algorithm: MPIR_Bcast_intra_scatter_recursive_doubling_allgather

    comm_size <= 8
      avg_msg_size <= 0
        Algorithm: MPIR_Bcast_intra_binomial

      avg_msg_size <= 8192
        Algorithm: MPIR_Bcast_intra_tree

      avg_msg_size <= 32768
        Algorithm: MPIR_Bcast_intra_tree

      avg_msg_size <= 65536
        Algorithm: MPIR_Bcast_intra_tree

      avg_msg_size <= 131072
        Algorithm: MPIR_Bcast_intra_binomial

      any
        Algorithm: MPIR_Bcast_intra_scatter_recursive_doubling_allgather

  
......

==========================================
Summary of rules per collective algorithm:
MPIR_Allgather_intra_brucks
   >>> intra_comm >>> any >>> total_msg_size < 81920
MPIR_Allgather_intra_recursive_doubling
   >>> intra_comm >>> comm_size is power-of-two >>> total_msg_size < 524288
MPIR_Allgather_intra_ring
   >>> intra_comm >>> comm_size is power-of-two >>> any
   >>> intra_comm >>> any >>> any
MPIR_Allgather_inter_local_gather_remote_bcast
   >>> inter_comm
MPIR_Allgatherv_intra_brucks
   >>> intra_comm >>> any >>> total_msg_size < 81920
MPIR_Allgatherv_intra_recursive_doubling
   >>> intra_comm >>> comm_size is power-of-two >>> total_msg_size < 524288
MPIR_Allgatherv_intra_ring
   >>> intra_comm >>> comm_size is power-of-two >>> any
   >>> intra_comm >>> any >>> any
MPIR_Allgatherv_inter_remote_gather_local_bcast
   >>> inter_comm
MPIR_Allreduce_intra_recursive_doubling
   >>> intra_comm >>> comm_size <= 1 >>> avg_msg_size <= 16 >>> operation is not commutative
   >>> intra_comm >>> comm_size <= 1 >>> avg_msg_size <= 64 >>> operation is not commutative
   >>> intra_comm >>> comm_size <= 1 >>> avg_msg_size <= 2048 >>> operation is not commutative
   >>> intra_comm >>> comm_size <= 1 >>> avg_msg_size <= 16384 >>> operation is not commutative
   >>> intra_comm >>> comm_size <= 1 >>> any >>> built-in operators
   >>> intra_comm >>> comm_size <= 8 >>> avg_msg_size <= 32 >>> operation is not commutative
   >>> intra_comm >>> comm_size <= 8 >>> avg_msg_size <= 64 >>> operation is not commutative
......

@hzhou
Copy link
Contributor

hzhou commented Jul 17, 2025

Maybe a separate CVAR? It is too much output to fit into debug summary.

@mjwilkins18
Copy link
Contributor

Commenting to record in-person discussion; It would be great to have value ranges in the per-algorithm summary to see when each rule actually applies

@yfguo yfguo force-pushed the csel-printout branch 2 times, most recently from 039e5d8 to bb91990 Compare August 7, 2025 22:17
@yfguo
Copy link
Contributor Author

yfguo commented Aug 7, 2025

test: mpich/ch4/ofi

@yfguo
Copy link
Contributor Author

yfguo commented Aug 8, 2025

test: mpich/ch4/ofi

@yfguo yfguo force-pushed the csel-printout branch 2 times, most recently from bc08abc to 28f6b0c Compare August 8, 2025 19:33
@yfguo
Copy link
Contributor Author

yfguo commented Aug 8, 2025

test: mpich/ch4/ofi

@yfguo
Copy link
Contributor Author

yfguo commented Aug 9, 2025

test:mpich/ch4/ofi

@yfguo yfguo marked this pull request as ready for review August 9, 2025 04:12
@yfguo yfguo requested a review from hzhou August 9, 2025 04:12
Copy link
Contributor

@hzhou hzhou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add commit messages

@yfguo yfguo requested a review from hzhou August 11, 2025 22:10
@yfguo yfguo force-pushed the csel-printout branch 2 times, most recently from 554aefb to bc7abb8 Compare August 12, 2025 01:49
@yfguo
Copy link
Contributor Author

yfguo commented Aug 12, 2025

From offline discussion, I need to split the optimization changes to a separate PR and only keep the print out one here.

@yfguo
Copy link
Contributor Author

yfguo commented Aug 16, 2025

PR updates with only the changes to print of the collective selection.

yfguo added 4 commits August 25, 2025 06:47
Creating CSEL constants array for the string name of collective
and comm hierarchy. These string values will be used during parsing
of the JSON file, and printing of the CSEL tree node.

Separating the implementation details CSEL tree printing function for
the ease of maintenance.
Consolidate the POSIX coll algorithm enum definition under MPII. The
JSON parsing no longer need separate functions for them.
Consolidate the CH4 coll algorithm enum definition under MPII. The
JSON parsing no longer need separate functions for them.
Consolidate the OFI coll algorithm enum definition under MPII. The
JSON parsing no longer need separate functions for them.
yfguo added 2 commits August 25, 2025 06:47
MPIR_CVAR_COLLECTIVE_SELECTION_REPORT controls how MPICH show the
collective selection logic during init. It is turned off by default.
The user can choose to print the CSEL in tree format or summary
format (later commit).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants