-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve user warnings and logic for update_transformers and update_transformers_by_sdtype #695
Improve user warnings and logic for update_transformers and update_transformers_by_sdtype #695
Conversation
d9bdfbf
to
2147a08
Compare
rdt/hyper_transformer.py
Outdated
@@ -335,6 +375,8 @@ def update_transformers_by_sdtype( | |||
self._warn_update_transformers_by_sdtype(transformer, transformer_name) | |||
|
|||
transformer_instance = transformer | |||
self._multi_column_fields = self._create_multi_column_fields() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should just remove this from the init and move it into the detect method. Then we don't need to keep calling it
2147a08
to
e8d956c
Compare
Codecov Report
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## multi-column-transformer #695 +/- ##
============================================================
Coverage ? 100.00%
============================================================
Files ? 17
Lines ? 1884
Branches ? 0
============================================================
Hits ? 1884
Misses ? 0
Partials ? 0 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty good, just had a couple of comments
rdt/hyper_transformer.py
Outdated
@@ -205,7 +211,19 @@ def _validate_config(config): | |||
|
|||
sdtypes = config['sdtypes'] | |||
transformers = config['transformers'] | |||
if set(sdtypes.keys()) != set(transformers.keys()): | |||
|
|||
sdtype_keys = sdtypes.keys() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can't have a dict with repeated keys, so I don't think the is_sdtype_keys_unique
check if necessary.
|
||
def test_update_transformers_single_to_multi_column(self): | ||
"""Test ``update_transformers`` to go from single to mutli column transformer.""" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra line
tests/unit/test_hyper_transformer.py
Outdated
'B': None, | ||
'C': FloatFormatter(), | ||
} | ||
ht.field_transformers == expected_field_transformers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert
tests/unit/test_hyper_transformer.py
Outdated
('A', 'B'): None, | ||
'C': None, | ||
} | ||
ht.field_transformers == expected_field_transformers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert
ce0e2a2
to
054d9eb
Compare
d8ba6e1
to
daf2cbe
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks pretty good! I do think we should move the logic to make the HyperTransformer
work with a multi-column transformer to the other PR though. It makes more sense with that issue
rdt/hyper_transformer.py
Outdated
def _remove_column_in_multi_column_fields(self, column): | ||
"""Remove a column that is part of a multi-column field. | ||
|
||
Update the tuple to remove the column from it and modify the ``multi_column_fields`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: you can just say
Remove the column from the tuple and ...
new_tuple = tuple(item for item in old_tuple if item != column) | ||
|
||
if len(new_tuple) == 1: | ||
new_tuple, = new_tuple |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the point of this line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's to not have a single element tuple in the field_transformer
dict. Without this line I have a test failing with the error:
AssertionError: assert {('column2',): 'transformer'} == {'column2': 'transformer'}
The right is the expected dict
rdt/hyper_transformer.py
Outdated
if isinstance(field, tuple): | ||
columns_to_sdtypes = self._get_columns_to_sdtypes(field) | ||
transformer.fit(data, columns_to_sdtypes) | ||
else: | ||
transformer.fit(data, field) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be moved to the base PR
daf2cbe
to
4ba050d
Compare
75e87c2
to
9494190
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
…ansformers_by_sdtype (#695) * modify update_transformer * modify validate_config * modify config printing * add _generate_column_in_tuple + update transformer * add flatten_column_list * add _update_column_in_tuple method * update_transformers_by_sdtype * integration test * unit tests * rename * docstring * remove _create_multi_column_fields in the __init__ * fix rebasing * use a multi column transformer in integration test * assert + extra line * rebase + test end to end with hypertransformer * add _get_columns_to_sdtype * rebase * remove notebook * undo remove unit test * 'Remove the column from the tuple and' docstring
…ansformers_by_sdtype (#695) * modify update_transformer * modify validate_config * modify config printing * add _generate_column_in_tuple + update transformer * add flatten_column_list * add _update_column_in_tuple method * update_transformers_by_sdtype * integration test * unit tests * rename * docstring * remove _create_multi_column_fields in the __init__ * fix rebasing * use a multi column transformer in integration test * assert + extra line * rebase + test end to end with hypertransformer * add _get_columns_to_sdtype * rebase * remove notebook * undo remove unit test * 'Remove the column from the tuple and' docstring
…ansformers_by_sdtype (#695) * modify update_transformer * modify validate_config * modify config printing * add _generate_column_in_tuple + update transformer * add flatten_column_list * add _update_column_in_tuple method * update_transformers_by_sdtype * integration test * unit tests * rename * docstring * remove _create_multi_column_fields in the __init__ * fix rebasing * use a multi column transformer in integration test * assert + extra line * rebase + test end to end with hypertransformer * add _get_columns_to_sdtype * rebase * remove notebook * undo remove unit test * 'Remove the column from the tuple and' docstring
…ansformers_by_sdtype (#695) * modify update_transformer * modify validate_config * modify config printing * add _generate_column_in_tuple + update transformer * add flatten_column_list * add _update_column_in_tuple method * update_transformers_by_sdtype * integration test * unit tests * rename * docstring * remove _create_multi_column_fields in the __init__ * fix rebasing * use a multi column transformer in integration test * assert + extra line * rebase + test end to end with hypertransformer * add _get_columns_to_sdtype * rebase * remove notebook * undo remove unit test * 'Remove the column from the tuple and' docstring
…ansformers_by_sdtype (#695) * modify update_transformer * modify validate_config * modify config printing * add _generate_column_in_tuple + update transformer * add flatten_column_list * add _update_column_in_tuple method * update_transformers_by_sdtype * integration test * unit tests * rename * docstring * remove _create_multi_column_fields in the __init__ * fix rebasing * use a multi column transformer in integration test * assert + extra line * rebase + test end to end with hypertransformer * add _get_columns_to_sdtype * rebase * remove notebook * undo remove unit test * 'Remove the column from the tuple and' docstring
Resolve #685
I’m not using the
BaseMultiColumnTransformer
because it’s crashing on some validation. It’s mainly because this transformer does not have asupported_sdtype
. So to check what I’m doing, I’m using the option to assign toNone
the transformers. Let me know what we should do about thisAlso, about the
update_transformers_by_sdtype
I made the change according to what is written in the issue; there is one case I’m not sure about. For me, when we update the sdtype we should not touch all the columns that have this sdtype but that are used in aMulti Column transformer
. In this PR I do it, the columns are removed to the tuple and are being assigned to the new transformer.