Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correctly sort the columns after transforming/reverse_transforming in the HyperTransformer #410

Merged
merged 11 commits into from
Mar 1, 2022

Conversation

fealho
Copy link
Member

@fealho fealho commented Feb 24, 2022

Resolve #405.

@codecov-commenter
Copy link

codecov-commenter commented Feb 25, 2022

Codecov Report

Merging #410 (6551c87) into master (1c483c4) will not change coverage.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff            @@
##            master      #410   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           12        12           
  Lines          956      1016   +60     
=========================================
+ Hits           956      1016   +60     
Impacted Files Coverage Δ
rdt/hyper_transformer.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1c483c4...6551c87. Read the comment docs.

@fealho fealho marked this pull request as ready for review February 25, 2022 16:07
@fealho fealho requested a review from a team as a code owner February 25, 2022 16:07
@fealho fealho requested review from katxiao, amontanez24 and a team and removed request for a team February 25, 2022 16:07
Comment on lines 394 to 397
if output_column.startswith(input_column):
if match_len < len(input_column):
best_i = i
match_len = len(input_column)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a method called get_final_output_columns that returns the output columns derived from a specified input column. Instead of checking that the names match, we can use that to see if it is a match

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That method doesn't handle combined columns (e.g. day#month#year.value). We don't really have any transformers that need it for now, so I'm not sure if we care about it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to worry about that for now

Copy link
Contributor

@amontanez24 amontanez24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left one comment

@@ -288,7 +288,7 @@ def get_final_output_columns(self, field):
else:
final_outputs.append(output)

return final_outputs
return sorted(final_outputs, reverse=True)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method returns values like [column.is_null, column.value], but since we expect the column order to be the other way around (i.e. [column.value, column.is_null]) I'm reversing the order.

@@ -370,10 +370,6 @@ def _fit_field_transformer(self, data, field, transformer):
if self._field_in_data(output_field, data):
self._fit_field_transformer(data, output_field, next_transformer)

else:
if output_name not in self._output_columns:
self._output_columns.append(output_name)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleting this because we need to reset the _output_columns when sorting them (this also doesn't get used before the transform/reverse_transform, so moving it later in the fit doesn't affect anything).

@@ -47,44 +47,6 @@ def _reverse_transform(self, data):
return data.astype('datetime64')


class DummyTransformerMultiColumn(BaseTransformer):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleting these tests because we are dropping support for multi-column transformers.

@@ -14,6 +14,52 @@

class TestHyperTransformer(TestCase):

def test__add_field_to_set_string(self):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to add these tests to get 100% coverage (I guess only the multi-column tests used the tuple case of this method, so deleting the tests decreased the coverage.)

@@ -363,7 +408,6 @@ def test__fit_field_transformer(self, get_transformer_instance_mock):
'a.out1': ['2', '4', '6'],
'a.out2': [1, 2, 3]
})
assert ht._output_columns == ['a.out1.value', 'a.out2']
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fit_field_transformers doesn't set _output_columns anymore.

Copy link
Contributor

@amontanez24 amontanez24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@fealho fealho requested review from pvk-developer and removed request for katxiao and pvk-developer February 28, 2022 21:07
@fealho fealho requested a review from katxiao March 1, 2022 02:59
@fealho fealho merged commit 6c4d9d6 into master Mar 1, 2022
@fealho fealho deleted the issue-405-transform-order branch March 1, 2022 17:27
@amontanez24 amontanez24 added feature request Request for a new feature and removed feature request Request for a new feature labels Mar 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

HyperTransformer should sort columns after transform and reverse_transform
4 participants