-
Notifications
You must be signed in to change notification settings - Fork 8
add test for dataclusters #54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 4 commits
5bf4556
e3fe16f
fdab59c
82888bb
f63abd5
b959a09
de2e964
b25e9a4
17582f0
3638253
ca8a169
2105474
cdaa3f2
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -68,6 +68,21 @@ def __init__(self, x, y, res): | |
def __iter__(self): | ||
return self | ||
|
||
def __eq__(self, other): | ||
# this function makes sure two DataClusters object is equal. Namely equal here means | ||
# all the attribute of two objects are equal. | ||
return ( | ||
np.array_equal(self.x, other.x) | ||
and np.array_equal(self.y, other.y) | ||
and np.array_equal(self.data_order, other.data_order) | ||
and np.array_equal(self.clusters, other.clusters) | ||
and self.res == other.res | ||
and self.current_idx == other.current_idx | ||
and self.lastcluster_idx == other.lastcluster_idx | ||
and self.lastpoint_idx == other.lastpoint_idx | ||
and self.status == other.status | ||
) | ||
|
||
def clear(self): | ||
"""Clear all members, including user data.""" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we need a better docstring. When we touch code, we like to improve it. |
||
self.x = np.array([]) | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,61 @@ | ||
import numpy as np | ||
import pytest | ||
|
||
from diffpy.srmise.dataclusters import DataClusters | ||
|
||
|
||
@pytest.mark.parametrize( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. since we are only doing one test here, we don't need the mark.parameterize, you can just load the values directly into the test below. |
||
"inputs, expected", | ||
[ | ||
( | ||
{ | ||
"input_x": np.array([1, 2, 3]), | ||
"input_y": np.array([3, 2, 1]), | ||
"input_res": 4, | ||
}, | ||
{ | ||
"x": np.array([]), | ||
"y": np.array([]), | ||
"data_order": np.array([], dtype=np.int32), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why are we setting these to int32? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Because in the original code it set the attribute into There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes, I kind of know why you did that, but I am also trying to emphasize a point that tests test behavior, not code, so we don't design tests to make code pass, we design tests to ensure specific behaviors, then we write code to make sure the tests pass. Then we are sure that the code has the right behavior. So understand why that is int32 and ask the question, why and is that the best thing? We may not find the answer and then we can default to mimicking the original code, but we should at least go through that process. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Both Why int32? I don't remember, but I can make an educated guess. During original dev I used both 32 bit and 64 bit Python at various times and this might be a (very weak) gesture to compatibility between versions when serializing. Alternately, sometimes I was getting different results on the same input and, IIRC, found that small numerical differences between the 32 bit and 64 bit versions occasional caused different in branching during the recursive steps of peak extraction. So I may have specified int32 rather than int as a very early part of those debugging efforts, but never removed it. I don't think changing this should have an impact -- if I knew about some subtle behavior I almost certainly would have left a comment. My opinion is that it would be better to complete the conversion to Python3 and test on some full PDFs before seeing what happens if this is changed. How that impacts what test should be written now, I'll leave to Simon. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes, please remove. Per @Ainamacar this is technical debt and it generally better to remove it when we find it, than "trying to remember to remove it later if we need to"..... |
||
"clusters": np.array([[]], dtype=np.int32), | ||
"res": 0, | ||
"current_idx": 0, | ||
"lastcluster_idx": None, | ||
"lastpoint_idx": None, | ||
"status": 0, | ||
}, | ||
), | ||
( | ||
sbillinge marked this conversation as resolved.
Show resolved
Hide resolved
|
||
{ | ||
"input_x": np.array([1]), | ||
"input_y": np.array([3]), | ||
"input_res": 4, | ||
}, | ||
{ | ||
"x": np.array([]), | ||
"y": np.array([]), | ||
"data_order": np.array([], dtype=np.int32), | ||
"clusters": np.array([[]], dtype=np.int32), | ||
"res": 0, | ||
"current_idx": 0, | ||
"lastcluster_idx": None, | ||
"lastpoint_idx": None, | ||
"status": 0, | ||
}, | ||
), | ||
], | ||
) | ||
def test_clear(inputs, expected): | ||
# Initialize DataClusters with input parameters | ||
actual = DataClusters(x=inputs["input_x"], y=inputs["input_y"], res=inputs["input_res"]) | ||
|
||
# Perform the clear operation | ||
actual.clear() | ||
|
||
# Assert each expected attribute against its actual value after clearing | ||
for attr, expected_value in expected.items(): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is no longer needed. Just instantiate an expected with the outputs and then assert equal. For readability, somthing like
If the clear operates in place and you can't assign to a new instance, then make the last line There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Initially I tried this approach but then I realized that we can't instantiate the expected object given those expected parameters. The reason why it cannot is because there is a function in the class called There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The only way that makes your structure works is to refactor the source code here to change from There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why is it possible to set a zero resolution in clear but not when instantiating? This is the question. This is probably an issue with the code itself. Actually, this is one reason we love tests, because we discover things like this.... Remember, tests test "behavior", not code. What behavior do we want for the clear method? Is clear used anywhere in the code? Try and figure out what it is used for, then make a suggestion for your preferred fix. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is used in the initilization (init)of the object. Basically, it guarantees that it could create a clean object in the instantiating step. The code reference is here There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Did not expect to see a flurry of activity on SrMise and thought I should chime in where it would be helpful. A Considered purely as a clustering method the trivial clustering is reasonable default behavior and the only change necessary is in
If you make that change, the docstring for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks Luke (@Ainamacar ) it is really great to have your input. I hope all is well with you! We are trying to refactor all our diffpy packages to a standard format this summer to make maintenance easier moving forward. It is also great training for new people like Steven (@stevenhua0320 ). Any input you can provide is really appreciated! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thank you, Simon! I will email you a brief personal update. In any case, I'd be happy to consult on this effort as I'm able. I'm glad to see the whole diffpy effort is going strong! |
||
assert ( | ||
np.array_equal(getattr(actual, attr), expected_value) | ||
if isinstance(expected_value, np.ndarray) | ||
else getattr(actual, attr) == expected_value | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please remove comment