-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Correctly sort the columns after transforming/reverse_transforming in the HyperTransformer
#410
Conversation
Codecov Report
@@ Coverage Diff @@
## master #410 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 12 12
Lines 956 1016 +60
=========================================
+ Hits 956 1016 +60
Continue to review full report at Codecov.
|
rdt/hyper_transformer.py
Outdated
if output_column.startswith(input_column): | ||
if match_len < len(input_column): | ||
best_i = i | ||
match_len = len(input_column) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a method called get_final_output_columns
that returns the output columns derived from a specified input column. Instead of checking that the names match, we can use that to see if it is a match
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That method doesn't handle combined columns (e.g. day#month#year.value
). We don't really have any transformers that need it for now, so I'm not sure if we care about it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to worry about that for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left one comment
@@ -288,7 +288,7 @@ def get_final_output_columns(self, field): | |||
else: | |||
final_outputs.append(output) | |||
|
|||
return final_outputs | |||
return sorted(final_outputs, reverse=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method returns values like [column.is_null, column.value]
, but since we expect the column order to be the other way around (i.e. [column.value, column.is_null]
) I'm reversing the order.
@@ -370,10 +370,6 @@ def _fit_field_transformer(self, data, field, transformer): | |||
if self._field_in_data(output_field, data): | |||
self._fit_field_transformer(data, output_field, next_transformer) | |||
|
|||
else: | |||
if output_name not in self._output_columns: | |||
self._output_columns.append(output_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deleting this because we need to reset the _output_columns
when sorting them (this also doesn't get used before the transform/reverse_transform, so moving it later in the fit doesn't affect anything).
@@ -47,44 +47,6 @@ def _reverse_transform(self, data): | |||
return data.astype('datetime64') | |||
|
|||
|
|||
class DummyTransformerMultiColumn(BaseTransformer): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deleting these tests because we are dropping support for multi-column transformers.
@@ -14,6 +14,52 @@ | |||
|
|||
class TestHyperTransformer(TestCase): | |||
|
|||
def test__add_field_to_set_string(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to add these tests to get 100% coverage (I guess only the multi-column tests used the tuple case of this method, so deleting the tests decreased the coverage.)
@@ -363,7 +408,6 @@ def test__fit_field_transformer(self, get_transformer_instance_mock): | |||
'a.out1': ['2', '4', '6'], | |||
'a.out2': [1, 2, 3] | |||
}) | |||
assert ht._output_columns == ['a.out1.value', 'a.out2'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fit_field_transformers doesn't set _output_columns anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Resolve #405.