Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: ht.array constructor respects implicit torch device when copy is set to false #1363

Conversation

JuanPedroGHM
Copy link
Member

@JuanPedroGHM JuanPedroGHM commented Feb 10, 2024

Due Diligence

  • General:
  • Implementation:
    • unit tests: all split configurations tested
    • unit tests: multiple dtypes tested
    • documentation updated where needed

Description

ht.array constructor deleted original object torch device when the copy flag is set to False.

Issue/s resolved: #1321

Changes proposed:

  • Earlier check in ht.array factory for the original object's device.
  • Added a test for ht.array using torch tensor, on different devices (when changing the env variable), and with copy set to false.

Type of change

  • Bug fix (non-breaking change which fixes an issue)

Does this change modify the behaviour of other functions? If so, which?

yes?

@JuanPedroGHM JuanPedroGHM added bug Something isn't working PR talk labels Feb 10, 2024
@JuanPedroGHM JuanPedroGHM self-assigned this Feb 10, 2024
@ghost
Copy link

ghost commented Feb 10, 2024

👇 Click on the image for a new way to code review

Review these changes using an interactive CodeSee Map

Legend

CodeSee Map legend

Copy link
Contributor

Thank you for the PR!

Copy link

codecov bot commented Feb 10, 2024

Codecov Report

Attention: Patch coverage is 89.47368% with 2 lines in your changes are missing coverage. Please review.

Project coverage is 92.05%. Comparing base (7acf5be) to head (7fcf288).

Files Patch % Lines
heat/core/factories.py 89.47% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1363      +/-   ##
==========================================
- Coverage   92.06%   92.05%   -0.01%     
==========================================
  Files          80       80              
  Lines       11942    11950       +8     
==========================================
+ Hits        10994    11001       +7     
- Misses        948      949       +1     
Flag Coverage Δ
unit 92.05% <89.47%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mtar
Copy link
Collaborator

mtar commented Feb 15, 2024

Thank you for tackling this issue @JuanPedroGHM. I created a table of the allowed parameter combinations for clarity:

copy split is_split allowed? ; expected behaviour
False Int Int/None 🔴 ; exception
False None Int/None 🟢 ; reuse memory
True Int Int 🔴 ; exception
True None Int 🟡 : ; copy local memory (implemented?)
True Int/None None 🟢 ; copy memory
None Int Int 🔴 ; exception
None Int None 🟢 ; reuse memory as much as possible before the distributing and copying
None None Int 🟡 ; reuse memory if possible, local copy otherwise (implemented?)
None None None 🟢 ; reuse memory if possible, copy otherwise

What do you think?

@JuanPedroGHM
Copy link
Member Author

JuanPedroGHM commented Feb 22, 2024

So I did a bit of a brute force grid search and found all this instances that to me are incorrect behavior from heat. Let me know what you think:

Current inconsistent behavior (Summary):

  • When creating from a {list, numpy, torch}, if is_split is set, and split is not, the split property of the new array is set to that split, but the array is not actually split.
    • This also happens for heat, when the original array is not split.
    • Expected result: the is_split argument should be ignored, as they are not split originally. New DNDArray should have the same split indicated as the split arg
  • When both split and is_split are set, it always throws an exception. Could be changed to ignore in some cases, for list, numpy, torch arrays for example, as they are not split by default.
  • When the original obj is a heat DNDArray that has already been split, the behavior is very inconsistent.
    • The split property of the new array is incorrect a lot of times, even if the split is done right.
    • When the original array is split, and the argument split of ht.array is set, the new array is split a second time. Probably not what we want.
  • When the og object is a heat dndarray, the split/is_split exclusivity will not throw and error as long as all the properties from the old and new array match (exits on the first if clause).
  • No proper test with multiple devices yet, Horeka and Haicore are unavailable.

Attached is also a full table with the data.

bug_results.md

Copy link
Contributor

Thank you for the PR!

Copy link
Contributor

Thank you for the PR!

@JuanPedroGHM
Copy link
Member Author

JuanPedroGHM commented Feb 28, 2024

Changes:

  • Raised exceptions when using the ht.array method would resplit the existing and already split DNDArray. If the original DNDArray is not split (obj.split is None), using ht.array will return a split object. This is to ensure compatibility with the rest of the heat code.
  • The check if the arguments split and is_split was moved higher to ensure it cannot be circumvented with the right choice of arguments.
  • Added additional tests that check those combinations of arguments.

New table:
bug_results.md

Copy link
Contributor

Thank you for the PR!

@ClaudiaComito ClaudiaComito requested review from mtar and mrfh92 and removed request for ClaudiaComito February 29, 2024 16:45
Copy link
Collaborator

@mtar mtar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @JuanPedroGHM, I made some suggestions for better understanding.

heat/core/factories.py Outdated Show resolved Hide resolved
heat/core/factories.py Outdated Show resolved Hide resolved
heat/core/factories.py Outdated Show resolved Hide resolved
heat/core/factories.py Outdated Show resolved Hide resolved
JuanPedroGHM and others added 2 commits March 4, 2024 13:21
Copy link
Contributor

github-actions bot commented Mar 4, 2024

Thank you for the PR!

Copy link
Contributor

github-actions bot commented Mar 4, 2024

Thank you for the PR!

Copy link
Contributor

github-actions bot commented Mar 4, 2024

Thank you for the PR!

Copy link
Contributor

github-actions bot commented Apr 2, 2024

Thank you for the PR!

Copy link
Contributor

github-actions bot commented Apr 8, 2024

Thank you for the PR!

Copy link
Contributor

Thank you for the PR!

@ClaudiaComito ClaudiaComito added this to the 1.4.0 milestone Apr 15, 2024
mrfh92
mrfh92 previously approved these changes Apr 15, 2024
Copy link
Collaborator

@mrfh92 mrfh92 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my point of view this looks fine; in particular, I guess that the additional error messages will be very helpful (in particular since I have already made a mistake regarding split and is_split myself).
Thanks 👍

Copy link
Contributor

@ClaudiaComito ClaudiaComito left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @JuanPedroGHM .

The CI matrix must be updated here before we can merge.

Also, just to be safe, can you test that the tutorials still behave as expected after these changes? Internals would be a good one to check (also feel free to add content there, I don't think we've addressed the copy attribute at all).

Thanks again!

Copy link
Contributor

Thank you for the PR!

@JuanPedroGHM
Copy link
Member Author

JuanPedroGHM commented Apr 16, 2024

Thanks a lot @JuanPedroGHM .

The CI matrix must be updated here before we can merge.

Also, just to be safe, can you test that the tutorials still behave as expected after these changes? Internals would be a good one to check (also feel free to add content there, I don't think we've addressed the copy attribute at all).

Thanks again!

Thanks for the suggestion @ClaudiaComito ! I ran the tutorial (on my laptop, so single gpu and without the datasets) and they ran perfectly. I did find one error in the internals tutorial with an undefined variable "target_map", which I corrected with the last commit.

I also added a small example to the interoperability section in tutorial 2 about using the copy keyword.

Copy link
Contributor

Thank you for the PR!

Copy link
Contributor

Thank you for the PR!

Copy link
Contributor

Thank you for the PR!

Copy link
Contributor

@ClaudiaComito ClaudiaComito left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again, next try...

Copy link
Contributor

Thank you for the PR!

@ClaudiaComito ClaudiaComito merged commit 86fe8a6 into main Apr 17, 2024
52 of 54 checks passed
Copy link
Contributor

Backport failed for release/1.3.x, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin release/1.3.x
git worktree add -d .worktree/backport-1363-to-release/1.3.x origin/release/1.3.x
cd .worktree/backport-1363-to-release/1.3.x
git switch --create backport-1363-to-release/1.3.x
git cherry-pick -x b1203e745f5045537794c6581728c78444d29776 f1b568b323680ecf84b312a0355ea6061d7398bc 5525bb6f5d1b415672bc3a9ec96bac933c18ce5e b1ef6f95547493133f8f650a42b5ed6fe12097fb 3a97d5f9dae0457c1bc0a32670d3ed6349bf20e0 6f878faf1f8565ca5ad745a38c90891cf8d7caf7 c5052008fc70093c6cd98282ade4da3052c092d2 dffa4fad0141a90adaa8c9858d307d7e99c84a58 a84ba76fae2ee107ba71e898f73b7e1babbe4043

@ClaudiaComito ClaudiaComito deleted the bugs/1321-_Bug_factories_array_illegal_parameter_combinations_copy_&_split branch April 17, 2024 11:30
@ClaudiaComito ClaudiaComito restored the bugs/1321-_Bug_factories_array_illegal_parameter_combinations_copy_&_split branch April 17, 2024 11:31
@mtar mtar deleted the bugs/1321-_Bug_factories_array_illegal_parameter_combinations_copy_&_split branch October 21, 2024 13:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: factories.array illegal parameter combinations copy & split
4 participants