-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Large data counts support for MPI Communication #1765
base: main
Are you sure you want to change the base?
Conversation
Thank you for the PR! |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1765 +/- ##
==========================================
- Coverage 92.26% 92.25% -0.01%
==========================================
Files 84 84
Lines 12447 12463 +16
==========================================
+ Hits 11484 11498 +14
- Misses 963 965 +2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Thank you for the PR! |
1 similar comment
Thank you for the PR! |
I have encountered the following problem:
results in the following error:
With 2 ** 10 in the last entry of shape, there is not problem, so it seems to be related to large counts. |
Benchmarks results - Sponsored by perun
Grafana Dashboard |
Could there be the problem that for all communication involving MPI-Operations like MPI.SUM etc. such an operation is not well-defined on the MPI-Vector construction chosen for the buffers? |
Thank you for the PR! |
Have you found some bug? I don't think it should be an issue, as the vector datatype is just pointing to where the data is, where it needs to go, and it in what order. As long as both send and recv buffers are well-defined by the datatype, there should not be an issue with MPI operations. |
The example with Allreduce I posted above caused an error for me. |
Due Diligence
Description
Some MPI implementation are limited to sending only 2^31-1 elements at once. As far as I have tested, this also applies for OpenMPI 4.1 and 5.0, because support has not been added to mpi4py. (At least in my tests it failed).
This small changes uses the trick described here, to pack contiguous data into an MPI Vector, extending the limit of elements being sent.
This is for contiguous data, as non-contiguous data is already packed in recursive vector data types, reducing the need to apply this trick.
Issue/s resolved: #
Changes proposed:
Type of change
Does this change modify the behaviour of other functions? If so, which?
yes, probably a lot of them.