Skip to content

Comments

Refactor test cases in feature engineering and preprocessing for impr…#6

Merged
vuductung merged 4 commits intomainfrom
development
Mar 20, 2025
Merged

Refactor test cases in feature engineering and preprocessing for impr…#6
vuductung merged 4 commits intomainfrom
development

Conversation

@vuductung
Copy link
Collaborator

No description provided.

…oved clarity and functionality

- update test methods and remove deprecated tests
[0.5, 4.0],
]
)
expected_output = np.array([[0.5, 1.0], [0.5, 1.0]])

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The expected output has been changed to match the actual implementation. The original test was expecting mean distances of [0.5, 4.0] but the correct expected values are [0.5, 1.0] based on the implementation.

        expected_output = np.array([[0.5, 1.0], [0.5, 1.0]])

result = self.helper._calculate_mean_distance(input)
np.testing.assert_array_equal(result, expected_output)

def test_feature_engineering_mean(self):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method name has been corrected from calculate_mean_distance to _calculate_mean_distance to match the actual implementation in the code, as indicated by the leading underscore.

        result = self.helper._calculate_mean_distance(input)

self.assertTrue(np.array_equal(result_axis_1, self.d))

def test_feature_engineering_SNR(self):
result_axis_0 = self.helper.feature_engineering(self.b, axis=0, func="SNR")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test for rank_intensity on axis=1 has been removed as it was failing. The implementation likely doesn't support or correctly handle rank_intensity on axis=1.

@@ -119,7 +119,7 @@ def test_calculate_y_pseudo_input_diffs(self):
)
expected_result = torch.tensor(
[

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed a precision issue in the test. The original test was looking for a very small number (2.3842e-07) but the correct expectation is 0.0, as the implementation likely rounds or handles such small values as zero.

                [0.0, 0.0, 0.0],

self.assertTrue(torch.allclose(result, expected_result, rtol=1e-04))

def test_initial_alignment(self):
def test_initial_alignment_for_list_of_arrays(self):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test has been renamed and modified to test initialization with a list of arrays rather than a single array. This is more comprehensive and tests a different code path.

    def test_initial_alignment_for_list_of_arrays(self):
        "
        Test the initial_alignment
        "
        # Create a sample input data
        data = [
            np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]),
            np.array([[7, 8, 9], [4, 5, 6], [1, 2, 3]]),
        ]

# Expected output after initial alignment
expected_output = np.array([[4, 5, 6], [4, 5, 6], [4, 5, 6]])
expected_output = [
np.array([[4, 5, 6], [4, 5, 6], [4, 5, 6]]),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated expected output to match the list of arrays input structure. Each array in the list should be aligned to its median values.

        # Expected output after initial alignment
        expected_output = [
            np.array([[4, 5, 6], [4, 5, 6], [4, 5, 6]]),
            np.array([[4, 5, 6], [4, 5, 6], [4, 5, 6]]),
        ]

# Assert the expected result
pd.testing.assert_frame_equal(
np.isnan(result), np.isnan(self.expected_result_for_normalization)
def test_align_by_level_first_dataframe(self):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test method was renamed to be more specific. Now it tests aligning the first dataframe in a list by level.

    def test_align_by_level_first_dataframe(self):

[("A", 1234), ("B", 1223), ("A", 5211), ("B", 1221)],
names=["pg", "mod_seq_charge_hash"],
),
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test was modified to use a list of dataframes as input, rather than a single dataframe, to match the updated implementation.

        input_data = [frag_level_intensity_data, frag_level_intensity_data]

# Create sample data
input_data = [frag_level_intensity_data, frag_level_intensity_data]

# Call the method under test

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method call was updated to pass a list of dataframes, specify the level to align by, and indicate which indices to modify through alignment.

        result = self.preprocessing.align_by_level(input_data, level="pg", indices=[0])

names=["pg", "mod_seq_charge_hash"],
),
)
input_data = [frag_level_intensity_data, frag_level_intensity_data]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the previous change, the method call was updated to match the new signature with required parameters.

        result = self.preprocessing.align_by_level(input_data, level="pg", indices=[0])

@@ -392,19 +385,14 @@ def test_get_non_regularatory_pg_level_intensity_data(self):

def test_get_values_to_shift_for_normalisation_by_sample(self):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improved the DataFrame creation syntax to directly set the index during initialization rather than after creation, making the code more concise and clearer.

        pg_level_non_reg_intensity_data = pd.DataFrame(
            data={
                "sample1": [1, 1, 2, 1],
                "sample2": [10, 11, 14, 15],
                "sample3": [1, 1, 1, 4],
                "sample4": [1, 3, 1, 1],
            },
            index=["A", "B", "C", "D"],
        )

[("A", 1234), ("B", 1223), ("A", 5211), ("B", 1221)],
names=["pg", "mod_seq_charge_hash"],
),
frag_level_intensity_data = np.array(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test was simplified to use NumPy arrays instead of pandas DataFrames, and the normalization function signature was updated to take a no_const parameter instead of a Series of values to shift.

        frag_level_intensity_data = np.array(
            [
                [1, 2, 2, 1],
                [1, 3, 1, 3],
                [2, 14, 21, 1],
                [3, 2, 4, 2],
            ]
        )

        expected_output = np.array(
            [
                [2, 2, 2, 1],
                [2, 3, 1, 3],
                [3, 14, 21, 1],
                [4, 2, 4, 2],
            ]
        )

np.testing.assert_array_equal(result, expected_output)

def test_init_align_with_list_of_arrays(self):
# create input data

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method call was updated to match the new signature which takes a no_const parameter instead of values_to_shift.

        result = self.preprocessing.normalize_by_sample(
            frag_level_intensity_data, no_const=3
        )

np.testing.assert_array_equal(result, expected_output)

def test_init_align_with_list_of_arrays(self):
# create input data

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the assertion to use NumPy's array_equal instead of pandas' frame_equal to match the updated data types.

        np.testing.assert_array_equal(result, expected_output)

@github-actions
Copy link

Several test methods were completely removed across multiple files. These include tests for weighted variance calculation, specific loss functions (MSE/MAE), coefficient of variation, and other methods. This suggests that either these functions were removed from the implementation or their interfaces have changed significantly. It would be good to either update these tests to match the current implementation or document why they were removed.

@github-actions
Copy link

Number of tokens: input_tokens=25769 output_tokens=2276 max_tokens=4096
review_instructions=''
config={}
thinking: ```
[]

…ring

- Implement tests for various scenarios including perfect correlation, edge cases, and shape discrepancies.
- Ensure comprehensive coverage of the function's behavior with NaN values and different reference shapes.
- Simplify input array structure in the test case for better readability.
- Correct expected output to match the intended result of the mean distance calculation.
Copy link
Member

@ammarcsj ammarcsj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

self.assertTrue(np.allclose(result_axis_0, self.h))
self.assertTrue(np.allclose(result_axis_1, self.i))

def test_nan_correlation_w_ref(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by nan correlation? Maybe short docstring

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jap, will do, nan_correlation just means pearson correlation calculation while masking union of missing values.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah ok, maybe then pearson_correlation_nanmasked or something like this would be more expressive?

@vuductung vuductung merged commit aa172a7 into main Mar 20, 2025
0 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants