-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhancement: Avoid unnecessary gathering of distributed operand #1216
Enhancement: Avoid unnecessary gathering of distributed operand #1216
Conversation
Thank you for the PR! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @samadpls. However, your contribution is contrary to the intended behavior of the function and does not address the issue. Perhaps it wasn't written clearly and concisely enough, so I rephrase it again:
What we want for allclose
/isclose
: If we have a distributed array and a non-distributed array as inputs, slice the undistributed array such that we can compare it with the corresponding local part of the distributed array on the process.
Hope this helps
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a small step into the right direction 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@samadpls here's what we're trying to achieve with this fix.
First of all, all Heat operations are implemented to operate on memory-distributed arrays (DNDarrays) . In practice:
- operations are performed on many MPI processes, could be many CPUs or many GPUs, many nodes of a supercomputer;
- if a DNDarray is distributed, each MPI process stores only a slice of that DNDarray (along the
DNDarray.split
dimension) - users can choose whether to distribute which DNDarrays or not (
DNDarray.split = None
means each process has a copy of the entire DNDarray in memory).
In this PR, you are addressing the case in which logical operations all
, allclose
etc. compare a distributed DNDarray to a non-distributed one.
Example with small arrays for better understanding:
import numpy as np
import heat as ht
x = ht.arange(30, split=0).reshape(-1, 2)
y = ht.arange(30).reshape(-1, 2)
print(ht.all(x == y))
This will return True
no matter how many MPI processes you run it on.
Try and print out the local arrays (DNDarray.larray
) to get a better understanding what the underlying data on each process really are (run this code on more than one process, see here):
rank = x.comm.rank
print(f"On rank {rank}: local x data: {x.larray}")
print(f"On rank {rank}: local y data: {y.larray}")
In the current implementation, when a distributed x
is compared to non-distributed y
, x
gets "gathered" so it is no longer distributed. But "gathering" all the slices of a distributed array onto each process is communication- and memory-intensive and, in this case, unnecessary. Each process can simply compare the local slice of x (x.larray
) to the corresponding slice of y
and, at the end, only communicate whether the slices on each process match or not.
Feel free to ask if anything is unclear.
I apologize for the delay in responding. I've been on vacation and traveling for the past 22 days, and I just landed few hours ago. I'll definitely get back to work and address the PR comments. Thanks for your understanding! |
No problem at all @samadpls, thanks for your time! |
Signed-off-by: samadpls <abdulsamadsid1@gmail.com>
for more information, see https://pre-commit.ci
Signed-off-by: samadpls <abdulsamadsid1@gmail.com>
Thank you for the PR! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @samadpls , two more small changes and we're good to go.
if x.split is not None and y.split is None and y.ndim > 0: | ||
t2 = factories.array(y.larray, device=x.device, split=x.split) | ||
x, t2 = sanitation.sanitize_distribution(x, t2, target=x) | ||
return x, t2 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're missing the same check for y distributed, x non distributed I think. Otherwise we're good to go.
Co-authored-by: Claudia Comito <39374113+ClaudiaComito@users.noreply.github.com>
Signed-off-by: samadpls <abdulsamadsid1@gmail.com>
…thub.com/samadpls/heat into features/1064-optimize-logical-functions
for more information, see https://pre-commit.ci
Thank you for the PR! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We made it, thanks a lot @samadpls !
Thank you so much @ClaudiaComito, I couldn't have done it without your assistance. |
My pleasure. The CI is failing, but this has nothing to do with your PR, we've been working on it in #1313 and #1314 . There's nothing you can do about it, #1314 needs to be merged before everything else. We're about to start the Christmas break so things will be grinding to a halt until early January. Happy holidays! 🎄 |
Thank you for the PR! |
Thank you for the PR! |
Thank you for the PR! |
Thank you for the PR! |
Backport failed for Please cherry-pick the changes locally and resolve any conflicts. git fetch origin release/1.3.x
git worktree add -d .worktree/backport-1216-to-release/1.3.x origin/release/1.3.x
cd .worktree/backport-1216-to-release/1.3.x
git switch --create backport-1216-to-release/1.3.x
git cherry-pick -x 9d093b2b1b971320b6dbfa8d32af450cf414e22b 7b7a19a16069af11e54e6eabb94ed41ec68eb1e2 f8133118af2e4898f25cd7846034acc750484301 bdf62df2ef726fa9e97c5d54846ba2aee4b0a1bb bde9d0e5b34243c75c64ed0840cf053963e01f07 0e8060188f6c8c258badb691c272671f153f3574 b5bf608e0b34167d93e924ae35cd1b97ed8e4507 12c67ecb93c76ae457ae544845a354db4d8430a7 0ebe1ced2fb22bbbe6ab14999cc7f46f44fabf52 e5dc1213e878c0a1927365be8eff30979ff1d1ff e8d45d4a01d22c90a45c1a0089bbf0cad789b4b1 ae3f969828f1b9514ddbbe82dbca8ee37f3e6e54 43e77d81d8b41c488237566f1d340aff60cf5aaa 38f4314a9b721720453b00cf3927ab1384fae8a5 aa555d5a6747f5c9e6d97fda12c0e68f2443b835 52cb055e3dab9518a5c84a91a36070b028c461a7 d0bf3f3e5899688f28cbfbb96bddbba5a2c420ef 60fb1cddc6a414a8d9382b1c54d6366758b52bb6 3d1fe15cd2539b36beb4610bc08296843f765ee7 |
Due Diligence
main
for new features, latest release branch (e.g.release/1.3.x
) for bug fixesDescription
Updated
__sanitize_close_input
logic to resolve issue #1064 regarding unnecessary gathering of distributed operands.Issue/s resolved: #1064
Changes proposed:
__sanitize_close_input
function to avoid unnecessary gathering of distributed operands.Type of change
New feature (non-breaking change which adds functionality)
Memory requirements
Memory requirements for this change have not been profiled at this time.
Performance
Performance metrics for this change have not been measured at this time.
Does this change modify the behaviour of other functions? If so, which?
yes