Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test Suite #365

Merged
merged 29 commits into from
Nov 20, 2019
Merged

Test Suite #365

merged 29 commits into from
Nov 20, 2019

Conversation

TheSlimvReal
Copy link
Contributor

@TheSlimvReal TheSlimvReal commented Aug 29, 2019

Description

Target issue #323

This PR introduces a test suite for improved unit testing of our functions.
Most test cases are quite trivial or don't involve the splitting of tensors, because for different number of nodes the result looks differently. Furthermore many edge cases are left untested.
This test suits helps in creating better tests by reducing boilerplate code for comparing an heat array with the expected numpy result.
For even more simplicity, heat functions that should perform equivalent to numpy functions can be tested in one single line on random numbers or a predefined array.

The general idea is to compare a (split) heat tensor with a numpy tensor without having to worry about differing split axis or number of nodes.

Feel free to tell me which functionality is missing that would reduce your time writing unit tests.

If this PR gets merged the next step would be to apply this test suite to all existing functions.

To get an idea how unit tests with this test suite might look lake check out the test_basic_test.py file or have a look at the documentation examples.

Type of change

  • New feature (non-breaking change which adds functionality)

@codecov
Copy link

codecov bot commented Sep 4, 2019

Codecov Report

Merging #365 into master will increase coverage by 0.01%.
The diff coverage is 99.04%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master    #365      +/-   ##
=========================================
+ Coverage   97.98%     98%   +0.01%     
=========================================
  Files          53      55       +2     
  Lines       10937   11042     +105     
=========================================
+ Hits        10717   10822     +105     
  Misses        220     220
Impacted Files Coverage Δ
heat/core/tests/test_manipulations.py 100% <ø> (ø) ⬆️
heat/core/tests/test_suites/test_basic_test.py 100% <100%> (ø)
heat/core/tests/test_suites/basic_test.py 98.52% <98.52%> (ø)
heat/core/communication.py 96.15% <0%> (+0.24%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1d6c849...84ad972. Read the comment docs.

Copy link
Member

@Markus-Goetz Markus-Goetz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of changes are needed here. Might have missed some still

heat/core/manipulations.py Outdated Show resolved Hide resolved
heat/core/manipulations.py Outdated Show resolved Hide resolved
heat/core/operations.py Outdated Show resolved Hide resolved
heat/core/statistics.py Outdated Show resolved Hide resolved
heat/core/tests/test_manipulations.py Outdated Show resolved Hide resolved

class BasicTest(TestCase):

__comm = MPICommunication()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not seem like a good idea. The comm might be dependent on the input variable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the comm should rather be passed via the assert functions? For the array_equal function I understand the point but for the func_equal, the input values don't have a comm.

equal_res = np.array(compare_func(local_numpy, numpy_array[slices]))
self.comm.Allreduce(MPI.IN_PLACE, equal_res, MPI.LAND)
self.assertTrue(equal_res, 'Local tensors do not match the corresponding numpy slices.')
self.assertEqual(local_numpy.dtype, numpy_array.dtype,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not need to necessarily equal but potential float_eq

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So should this just be left out and just rely on the equality check of the tensor's values

heat/core/tests/test_suites/basic_test.py Outdated Show resolved Hide resolved
heat/core/tests/test_suites/basic_test.py Show resolved Hide resolved
heat/core/tests/test_suites/basic_test.py Show resolved Hide resolved
Copy link
Member

@coquelin77 coquelin77 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test unique should have shape, dtype, and splits tested. It should also be tested with doubles and longs in the input array

this looks great. thank you for adding this!

heat/core/manipulations.py Outdated Show resolved Hide resolved
heat/core/tests/test_exponential.py Outdated Show resolved Hide resolved
heat/core/tests/test_manipulations.py Outdated Show resolved Hide resolved
heat/core/tests/test_manipulations.py Outdated Show resolved Hide resolved
heat/core/tests/test_suites/basic_test.py Outdated Show resolved Hide resolved
Check if the heat_array is equivalent to the numpy_array. Therefore first the split heat_array is compared to
the corresponding numpy_array slice locally and second the heat_array is combined and fully compared with the
numpy_array.
Note if the heat array is split it also needs to be balanced.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a secondary note should be added here that the standard numpy dtype is float64. this is different from torch/heat where the default if float32. essentially, the numpy arrays used in the function should be created with the proper dtype

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a general rule or can there be changes or single functions that differ in their behaviour? If the user wants to be certain, he can always pass args where the dtype is explicitly defined. I just don't want to make any general assumptions about the behaviour of the functions which might not be correct and result in unexpected results.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes the dtype can change after a function. my concern is that the types will not match up because of the difference in what is the standard types of the packages.
i think that the thing to do would be to create the numpy array with the dtype of the heat array, and if no dtype is available then it should be created with dtype=float32. that way they have the same starting point. and i believe that numpy will only increase the precision if it is necessary. i.e. type changes which occur in the course of the function will remain at the same precision.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I think I don't get the problem your are describing. This function only compares two arrays provided by the user in terms of values and type. Under which circumstances would the fact, that numpy creates 64 bit datatypes by default while heat/torch creates 32 bit datatypes, affect this process?

heat/core/tests/test_suites/basic_test.py Outdated Show resolved Hide resolved
heat/core/tests/test_suites/basic_test.py Outdated Show resolved Hide resolved
raise TypeError('The input tensors type must be one of [tuple, list, ' +
'numpy.ndarray, torch.tensor] but is {}'.format(type(tensor)))

np_res = numpy_func(np_array, **numpy_args)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dtype (float64 vs float32)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The user is always able to define this using the numpy_args. I want to keep the function general and therefore not make too many assumptions about what the user actually wants.

@TheSlimvReal
Copy link
Contributor Author

Most of the requested changes got fixed/discussed. Would be great if you could have a new look.

@coquelin77
Copy link
Member

bump, black error

@TheSlimvReal
Copy link
Contributor Author

TheSlimvReal commented Nov 14, 2019

Black got fixed

@TheSlimvReal TheSlimvReal merged commit e637fc7 into master Nov 20, 2019
@TheSlimvReal TheSlimvReal deleted the feature/323_test_suite branch November 20, 2019 09:38
@mtar mtar mentioned this pull request Apr 28, 2020
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants