Skip to content

libnbc: Fix int overflow when handling count parameters #9621

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Nov 10, 2021

Conversation

jjhursey
Copy link
Member

@jjhursey jjhursey commented Nov 3, 2021

  • In a reduce_scatter operation if the count array adds up to a
    value greater than INT_MAX then the count passed around is negative
    leading to an invalid buffer bring passed around often resulting in
    a segv crash.
  • The fix is to preserve the true count size as a size_t at all
    levels in the schedule (thus why there is a change to the protocol
    structures).
    • Instead of changing the count parameter of ompi_op_reduce we
      iterate over INT_MAX chunks of the buffer reducing each in turn.

 * In a reduce_scatter operation if the count array adds up to a
   value greater than INT_MAX then the count passed around is negative
   leading to an invalid buffer bring passed around often resulting in
   a segv crash.
 * The fix is to preserve the true count size as a `size_t` at all
   levels in the schedule (thus why there is a change to the protocol
   structures).
   - Instead of changing the count parameter of `ompi_op_reduce` we
     iterate over INT_MAX chunks of the buffer reducing each in turn.

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
(cherry picked from commit 6b8e368)
 * If the `count` is greater than `INT_MAX` then we call the
   operation in chunks that fit into an `int`.
 * This moves the functionality out of libnbc and into the common
   reduction operation so that all collectives may pass larger
   counts than `INT_MAX` into the internal reduction operation.

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
(cherry picked from commit 6075048)
@jjhursey
Copy link
Member Author

jjhursey commented Nov 5, 2021

This looks like it was accepted and merged in all of the release branches. RMs: Any reason not to merge it into v5?

@jjhursey
Copy link
Member Author

jjhursey commented Nov 9, 2021

@awlauria @gpaulsen Ping to merge

@awlauria awlauria merged commit ffd6f2b into open-mpi:v5.0.x Nov 10, 2021
@jjhursey jjhursey deleted the v50-libnbc-fix-overflow branch November 10, 2021 16:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants