Skip to content

add test for dataclusters #54

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Aug 10, 2024
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 25 additions & 3 deletions src/diffpy/srmise/dataclusters.py
Original file line number Diff line number Diff line change
Expand Up @@ -68,12 +68,34 @@ def __init__(self, x, y, res):
def __iter__(self):
return self

def __eq__(self, other):
return (
np.array_equal(self.x, other.x)
and np.array_equal(self.y, other.y)
and np.array_equal(self.data_order, other.data_order)
and np.array_equal(self.clusters, other.clusters)
and self.res == other.res
and self.current_idx == other.current_idx
and self.lastcluster_idx == other.lastcluster_idx
and self.lastpoint_idx == other.lastpoint_idx
and self.status == other.status
)

def clear(self):
"""Clear all members, including user data."""
"""
Clear all data and reset the cluster object to a transient initial state.

The purpose of this method is to provide a clean state before creating new clustering operations.
The object is updated in-place and no new instance is returned.

Returns
-------
None
"""
self.x = np.array([])
self.y = np.array([])
self.data_order = np.array([], dtype=np.int32)
self.clusters = np.array([[]], dtype=np.int32)
self.data_order = np.array([])
self.clusters = np.array([[]])
self.res = 0
self.current_idx = 0
self.lastcluster_idx = None
Expand Down
43 changes: 43 additions & 0 deletions src/diffpy/srmise/tests/test_dataclusters.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
import numpy as np
import pytest

from diffpy.srmise.dataclusters import DataClusters


@pytest.mark.parametrize(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we are only doing one test here, we don't need the mark.parameterize, you can just load the values directly into the test below.

"inputs, expected",
[
(
{
"input_x": np.array([1, 2, 3]),
"input_y": np.array([3, 2, 1]),
"input_res": 4,
},
{
"x": np.array([]),
"y": np.array([]),
"data_order": np.array([]),
"clusters": np.array([[]]),
"res": 0,
"current_idx": 0,
"lastcluster_idx": None,
"lastpoint_idx": None,
"status": 0,
},
),
],
)
def test_clear(inputs, expected):
# Initialize DataClusters with input parameters
actual = DataClusters(x=inputs["input_x"], y=inputs["input_y"], res=inputs["input_res"])

# Perform the clear operation
actual.clear()

# Assert each expected attribute against its actual value after clearing
for attr, expected_value in expected.items():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is no longer needed. Just instantiate an expected with the outputs and then assert equal. For readability, somthing like

input_cluster = DataClusters(x=inputs["x"]....)
expected = DataClusters(x=outputs["x"].....)
actual = input_cluster.clear()
assert expected == actual

If the clear operates in place and you can't assign to a new instance, then make the last line assert expected == actual.clear

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially I tried this approach but then I realized that we can't instantiate the expected object given those expected parameters. The reason why it cannot is because there is a function in the class called setdata that does not allow instantiating a object with 0 resolution as what clear does. here is part of the code that setdatadefines:
if res <= 0: raise ValueError("Resolution res must be greater than 0.")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only way that makes your structure works is to refactor the source code here to change from <= to <, but I'm not sure if this would change the behavior of the function as Luke originally plans...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it possible to set a zero resolution in clear but not when instantiating? This is the question. This is probably an issue with the code itself. Actually, this is one reason we love tests, because we discover things like this....

Remember, tests test "behavior", not code. What behavior do we want for the clear method?

Is clear used anywhere in the code? Try and figure out what it is used for, then make a suggestion for your preferred fix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is used in the initilization (init)of the object. Basically, it guarantees that it could create a clean object in the instantiating step. The code reference is here
self.clear()
self.setdata(x, y, res)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did not expect to see a flurry of activity on SrMise and thought I should chime in where it would be helpful. A res of 0 should produce the trivial clustering -- each point is its own cluster. I most likely required res > 0 because the point of the class is to find non-trivial clusterings and because in an early stage of development this class was more tightly coupled to the physical resolution of the PDF, which certainly has a positive value. Setting setting res = 0 when clearing was a transient stand in for an uninitialized state.

Considered purely as a clustering method the trivial clustering is reasonable default behavior and the only change necessary is in setdata(), along the lines of

        if res < 0:
            raise ValueError("Resolution res must be non-negative.")

If you make that change, the docstring for setvars() in pdfpeakextraction.py should probably also be updated to state
"cres: The clustering resolution, must be >= 0".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Luke (@Ainamacar ) it is really great to have your input. I hope all is well with you!

We are trying to refactor all our diffpy packages to a standard format this summer to make maintenance easier moving forward. It is also great training for new people like Steven (@stevenhua0320 ). Any input you can provide is really appreciated!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, Simon! I will email you a brief personal update.

In any case, I'd be happy to consult on this effort as I'm able. I'm glad to see the whole diffpy effort is going strong!

assert (
np.array_equal(getattr(actual, attr), expected_value)
if isinstance(expected_value, np.ndarray)
else getattr(actual, attr) == expected_value
)
Loading