Correctly sort the columns after transforming/reverse_transforming in the `HyperTransformer` #410

fealho · 2022-02-24T03:41:53Z

Resolve #405.

codecov-commenter · 2022-02-25T16:05:08Z

Codecov Report

Merging #410 (6551c87) into master (1c483c4) will not change coverage.
The diff coverage is 100.00%.

@@            Coverage Diff            @@
##            master      #410   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           12        12           
  Lines          956      1016   +60     
=========================================
+ Hits           956      1016   +60

Impacted Files	Coverage Δ
rdt/hyper_transformer.py	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1c483c4...6551c87. Read the comment docs.

amontanez24 · 2022-02-25T20:42:57Z

rdt/hyper_transformer.py

+                if output_column.startswith(input_column):
+                    if match_len < len(input_column):
+                        best_i = i
+                        match_len = len(input_column)


We have a method called get_final_output_columns that returns the output columns derived from a specified input column. Instead of checking that the names match, we can use that to see if it is a match

That method doesn't handle combined columns (e.g. day#month#year.value). We don't really have any transformers that need it for now, so I'm not sure if we care about it?

I don't think we need to worry about that for now

amontanez24

Left one comment

fealho · 2022-02-28T19:07:46Z

rdt/hyper_transformer.py

@@ -288,7 +288,7 @@ def get_final_output_columns(self, field):
            else:
                final_outputs.append(output)

-        return final_outputs
+        return sorted(final_outputs, reverse=True)


This method returns values like [column.is_null, column.value], but since we expect the column order to be the other way around (i.e. [column.value, column.is_null]) I'm reversing the order.

fealho · 2022-02-28T19:09:30Z

rdt/hyper_transformer.py

@@ -370,10 +370,6 @@ def _fit_field_transformer(self, data, field, transformer):
                if self._field_in_data(output_field, data):
                    self._fit_field_transformer(data, output_field, next_transformer)

-            else:
-                if output_name not in self._output_columns:
-                    self._output_columns.append(output_name)


Deleting this because we need to reset the _output_columns when sorting them (this also doesn't get used before the transform/reverse_transform, so moving it later in the fit doesn't affect anything).

fealho · 2022-02-28T19:10:09Z

tests/integration/test_hyper_transformer.py

@@ -47,44 +47,6 @@ def _reverse_transform(self, data):
        return data.astype('datetime64')


-class DummyTransformerMultiColumn(BaseTransformer):


Deleting these tests because we are dropping support for multi-column transformers.

fealho · 2022-02-28T19:11:44Z

tests/unit/test_hyper_transformer.py

@@ -14,6 +14,52 @@

 class TestHyperTransformer(TestCase):

+    def test__add_field_to_set_string(self):


Need to add these tests to get 100% coverage (I guess only the multi-column tests used the tuple case of this method, so deleting the tests decreased the coverage.)

fealho · 2022-02-28T19:12:22Z

tests/unit/test_hyper_transformer.py

@@ -363,7 +408,6 @@ def test__fit_field_transformer(self, get_transformer_instance_mock):
            'a.out1': ['2', '4', '6'],
            'a.out2': [1, 2, 3]
        })
-        assert ht._output_columns == ['a.out1.value', 'a.out2']


Fit_field_transformers doesn't set _output_columns anymore.

amontanez24

LGTM!

fealho added 7 commits February 23, 2022 19:39

Correct order

0c8d722

Update logic

cac0490

Revert changes

7c7aabe

Sort output columns

45f2aa6

Fix lint

9dd93c6

Add docstrings

b4310d0

Add test case

4f3bce4

fealho marked this pull request as ready for review February 25, 2022 16:07

fealho requested a review from a team as a code owner February 25, 2022 16:07

fealho requested review from katxiao, amontanez24 and a team and removed request for a team February 25, 2022 16:07

amontanez24 reviewed Feb 25, 2022

View reviewed changes

amontanez24 requested changes Feb 25, 2022

View reviewed changes

fealho added 4 commits February 28, 2022 09:41

Drop support for multiple column transformer

c773e02

Remove support for multi-column transformers

074d41f

Add add_field_to_set tests

3b315d6

Fix lint

6551c87

fealho commented Feb 28, 2022

View reviewed changes

fealho requested a review from amontanez24 February 28, 2022 19:13

amontanez24 approved these changes Feb 28, 2022

View reviewed changes

fealho requested review from pvk-developer and removed request for katxiao and pvk-developer February 28, 2022 21:07

fealho requested a review from katxiao March 1, 2022 02:59

katxiao approved these changes Mar 1, 2022

View reviewed changes

fealho merged commit 6c4d9d6 into master Mar 1, 2022

fealho deleted the issue-405-transform-order branch March 1, 2022 17:27

amontanez24 added feature request Request for a new feature and removed feature request Request for a new feature labels Mar 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Correctly sort the columns after transforming/reverse_transforming in the `HyperTransformer` #410

Correctly sort the columns after transforming/reverse_transforming in the `HyperTransformer` #410

fealho commented Feb 24, 2022

codecov-commenter commented Feb 25, 2022 •

edited

Loading

amontanez24 Feb 25, 2022

fealho Feb 25, 2022

amontanez24 Feb 28, 2022

amontanez24 left a comment

fealho Feb 28, 2022

fealho Feb 28, 2022

fealho Feb 28, 2022

fealho Feb 28, 2022

fealho Feb 28, 2022

amontanez24 left a comment

		@@ -47,44 +47,6 @@ def _reverse_transform(self, data):
		return data.astype('datetime64')


		class DummyTransformerMultiColumn(BaseTransformer):

		@@ -14,6 +14,52 @@

		class TestHyperTransformer(TestCase):

		def test__add_field_to_set_string(self):

Correctly sort the columns after transforming/reverse_transforming in the HyperTransformer #410

Correctly sort the columns after transforming/reverse_transforming in the HyperTransformer #410

Conversation

fealho commented Feb 24, 2022

codecov-commenter commented Feb 25, 2022 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amontanez24 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amontanez24 left a comment

Choose a reason for hiding this comment

Correctly sort the columns after transforming/reverse_transforming in the `HyperTransformer` #410

Correctly sort the columns after transforming/reverse_transforming in the `HyperTransformer` #410

codecov-commenter commented Feb 25, 2022 •

edited

Loading