Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve user warnings and logic for update_transformers and update_transformers_by_sdtype #695

Merged
merged 21 commits into from
Sep 13, 2023

Conversation

R-Palazzo
Copy link
Contributor

Resolve #685
I’m not using the BaseMultiColumnTransformer because it’s crashing on some validation. It’s mainly because this transformer does not have a supported_sdtype. So to check what I’m doing, I’m using the option to assign to None the transformers. Let me know what we should do about this

Also, about the update_transformers_by_sdtype I made the change according to what is written in the issue; there is one case I’m not sure about. For me, when we update the sdtype we should not touch all the columns that have this sdtype but that are used in a Multi Column transformer. In this PR I do it, the columns are removed to the tuple and are being assigned to the new transformer.

@R-Palazzo R-Palazzo requested a review from a team as a code owner August 24, 2023 14:50
@R-Palazzo R-Palazzo removed the request for review from a team August 24, 2023 14:50
@R-Palazzo R-Palazzo force-pushed the issue-685-update-transformers branch from d9bdfbf to 2147a08 Compare August 24, 2023 14:52
tests/integration/test_hyper_transformer.py Outdated Show resolved Hide resolved
rdt/hyper_transformer.py Outdated Show resolved Hide resolved
rdt/hyper_transformer.py Outdated Show resolved Hide resolved
rdt/hyper_transformer.py Outdated Show resolved Hide resolved
@@ -335,6 +375,8 @@ def update_transformers_by_sdtype(
self._warn_update_transformers_by_sdtype(transformer, transformer_name)

transformer_instance = transformer
self._multi_column_fields = self._create_multi_column_fields()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should just remove this from the init and move it into the detect method. Then we don't need to keep calling it

@codecov-commenter
Copy link

codecov-commenter commented Sep 4, 2023

Codecov Report

❗ No coverage uploaded for pull request base (multi-column-transformer@89d871d). Click here to learn what that means.
Patch has no changes to coverable lines.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@                     Coverage Diff                      @@
##             multi-column-transformer      #695   +/-   ##
============================================================
  Coverage                            ?   100.00%           
============================================================
  Files                               ?        17           
  Lines                               ?      1884           
  Branches                            ?         0           
============================================================
  Hits                                ?      1884           
  Misses                              ?         0           
  Partials                            ?         0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@fealho fealho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good, just had a couple of comments

rdt/hyper_transformer.py Show resolved Hide resolved
@@ -205,7 +211,19 @@ def _validate_config(config):

sdtypes = config['sdtypes']
transformers = config['transformers']
if set(sdtypes.keys()) != set(transformers.keys()):

sdtype_keys = sdtypes.keys()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't have a dict with repeated keys, so I don't think the is_sdtype_keys_unique check if necessary.


def test_update_transformers_single_to_multi_column(self):
"""Test ``update_transformers`` to go from single to mutli column transformer."""

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra line

'B': None,
'C': FloatFormatter(),
}
ht.field_transformers == expected_field_transformers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert

tests/unit/test_hyper_transformer.py Show resolved Hide resolved
('A', 'B'): None,
'C': None,
}
ht.field_transformers == expected_field_transformers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert

@R-Palazzo R-Palazzo force-pushed the issue-683-support-multi-column-transformers branch from ce0e2a2 to 054d9eb Compare September 7, 2023 07:42
@R-Palazzo R-Palazzo force-pushed the issue-685-update-transformers branch 3 times, most recently from d8ba6e1 to daf2cbe Compare September 8, 2023 08:00
Copy link
Contributor

@amontanez24 amontanez24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty good! I do think we should move the logic to make the HyperTransformer work with a multi-column transformer to the other PR though. It makes more sense with that issue

rdt/hyper_transformer.py Show resolved Hide resolved
def _remove_column_in_multi_column_fields(self, column):
"""Remove a column that is part of a multi-column field.

Update the tuple to remove the column from it and modify the ``multi_column_fields``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: you can just say

Remove the column from the tuple and ...

new_tuple = tuple(item for item in old_tuple if item != column)

if len(new_tuple) == 1:
new_tuple, = new_tuple
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the point of this line?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's to not have a single element tuple in the field_transformer dict. Without this line I have a test failing with the error:
AssertionError: assert {('column2',): 'transformer'} == {'column2': 'transformer'}
The right is the expected dict

Comment on lines 612 to 617
if isinstance(field, tuple):
columns_to_sdtypes = self._get_columns_to_sdtypes(field)
transformer.fit(data, columns_to_sdtypes)
else:
transformer.fit(data, field)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be moved to the base PR

Base automatically changed from issue-683-support-multi-column-transformers to multi-column-transformer September 12, 2023 15:55
Copy link
Contributor

@amontanez24 amontanez24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@R-Palazzo R-Palazzo merged commit 0cb92e8 into multi-column-transformer Sep 13, 2023
46 checks passed
@R-Palazzo R-Palazzo deleted the issue-685-update-transformers branch September 13, 2023 16:02
R-Palazzo added a commit that referenced this pull request Sep 14, 2023
…ansformers_by_sdtype (#695)

* modify update_transformer

* modify validate_config

* modify config printing

* add _generate_column_in_tuple + update transformer

* add flatten_column_list

* add _update_column_in_tuple method

* update_transformers_by_sdtype

* integration test

* unit tests

* rename

* docstring

* remove _create_multi_column_fields in the __init__

* fix rebasing

* use a multi column transformer in integration test

* assert + extra line

* rebase + test end to end with hypertransformer

* add _get_columns_to_sdtype

* rebase

* remove notebook

* undo remove unit test

* 'Remove the column from the tuple and' docstring
R-Palazzo added a commit that referenced this pull request Sep 20, 2023
…ansformers_by_sdtype (#695)

* modify update_transformer

* modify validate_config

* modify config printing

* add _generate_column_in_tuple + update transformer

* add flatten_column_list

* add _update_column_in_tuple method

* update_transformers_by_sdtype

* integration test

* unit tests

* rename

* docstring

* remove _create_multi_column_fields in the __init__

* fix rebasing

* use a multi column transformer in integration test

* assert + extra line

* rebase + test end to end with hypertransformer

* add _get_columns_to_sdtype

* rebase

* remove notebook

* undo remove unit test

* 'Remove the column from the tuple and' docstring
R-Palazzo added a commit that referenced this pull request Oct 27, 2023
…ansformers_by_sdtype (#695)

* modify update_transformer

* modify validate_config

* modify config printing

* add _generate_column_in_tuple + update transformer

* add flatten_column_list

* add _update_column_in_tuple method

* update_transformers_by_sdtype

* integration test

* unit tests

* rename

* docstring

* remove _create_multi_column_fields in the __init__

* fix rebasing

* use a multi column transformer in integration test

* assert + extra line

* rebase + test end to end with hypertransformer

* add _get_columns_to_sdtype

* rebase

* remove notebook

* undo remove unit test

* 'Remove the column from the tuple and' docstring
R-Palazzo added a commit that referenced this pull request Oct 31, 2023
…ansformers_by_sdtype (#695)

* modify update_transformer

* modify validate_config

* modify config printing

* add _generate_column_in_tuple + update transformer

* add flatten_column_list

* add _update_column_in_tuple method

* update_transformers_by_sdtype

* integration test

* unit tests

* rename

* docstring

* remove _create_multi_column_fields in the __init__

* fix rebasing

* use a multi column transformer in integration test

* assert + extra line

* rebase + test end to end with hypertransformer

* add _get_columns_to_sdtype

* rebase

* remove notebook

* undo remove unit test

* 'Remove the column from the tuple and' docstring
R-Palazzo added a commit that referenced this pull request Oct 31, 2023
…ansformers_by_sdtype (#695)

* modify update_transformer

* modify validate_config

* modify config printing

* add _generate_column_in_tuple + update transformer

* add flatten_column_list

* add _update_column_in_tuple method

* update_transformers_by_sdtype

* integration test

* unit tests

* rename

* docstring

* remove _create_multi_column_fields in the __init__

* fix rebasing

* use a multi column transformer in integration test

* assert + extra line

* rebase + test end to end with hypertransformer

* add _get_columns_to_sdtype

* rebase

* remove notebook

* undo remove unit test

* 'Remove the column from the tuple and' docstring
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants