Skip to content

libnbc: Fix int overflow when handling count parameters #9616

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Nov 3, 2021

Conversation

jjhursey
Copy link
Member

@jjhursey jjhursey commented Nov 2, 2021

  • In a reduce_scatter operation if the count array adds up to a
    value greater than INT_MAX then the count passed around is negative
    leading to an invalid buffer bring passed around often resulting in
    a segv crash.
  • The fix is to preserve the true count size as a size_t at all
    levels in the schedule (thus why there is a change to the protocol
    structures).
    • Instead of changing the count parameter of ompi_op_reduce we
      iterate over INT_MAX chunks of the buffer reducing each in turn.

 * In a reduce_scatter operation if the count array adds up to a
   value greater than INT_MAX then the count passed around is negative
   leading to an invalid buffer bring passed around often resulting in
   a segv crash.
 * The fix is to preserve the true count size as a `size_t` at all
   levels in the schedule (thus why there is a change to the protocol
   structures).
   - Instead of changing the count parameter of `ompi_op_reduce` we
     iterate over INT_MAX chunks of the buffer reducing each in turn.

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
 * If the `count` is greater than `INT_MAX` then we call the
   operation in chunks that fit into an `int`.
 * This moves the functionality out of libnbc and into the common
   reduction operation so that all collectives may pass larger
   counts than `INT_MAX` into the internal reduction operation.

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
@jjhursey jjhursey force-pushed the libnbc-fix-overflow branch from 972e9c4 to 6075048 Compare November 2, 2021 20:12
@jjhursey jjhursey merged commit 7d1711f into open-mpi:master Nov 3, 2021
@jjhursey jjhursey deleted the libnbc-fix-overflow branch November 3, 2021 14:33
@jjhursey
Copy link
Member Author

jjhursey commented Nov 3, 2021

Cherry picked both patches to the release branches. If the RMs for a release don't want the second change we can drop it on those branches - I would prefer to keep both patches on all release branches:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants