Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi column transformers crash when assigned to single column #734

Closed
amontanez24 opened this issue Oct 27, 2023 · 0 comments · Fixed by #710
Closed

Multi column transformers crash when assigned to single column #734

amontanez24 opened this issue Oct 27, 2023 · 0 comments · Fixed by #710
Assignees
Labels
bug Something isn't working
Milestone

Comments

@amontanez24
Copy link
Contributor

Environment Details

Please indicate the following details about the environment in which you found the bug:

  • RDT version: main branch
  • Python version: Any
  • Operating System: Any

Error Description

Both the RandomLocationGenerator and RegionalAnonymizer crash when running on one column with the following error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-42-a7c40dec790d> in <cell line: 29>()
     27 })
     28 
---> 29 ht.fit(data)

4 frames
/usr/local/lib/python3.10/dist-packages/rdt/hyper_transformer.py in fit(self, data)
    707                 field = column
    708 
--> 709             data = self._fit_field_transformer(data, field, self.field_transformers[field])
    710 
    711         self._validate_all_fields_fitted()

/usr/local/lib/python3.10/dist-packages/rdt/hyper_transformer.py in _fit_field_transformer(self, data, field, transformer)
    635                 transformer.fit(data, columns_to_sdtypes)
    636             else:
--> 637                 transformer.fit(data, field)
    638 
    639             self._transformers_sequence.append(transformer)

/usr/local/lib/python3.10/dist-packages/rdt/transformers/base.py in wrapper(self, *args, **kwargs)
     53         method_name = function.__name__
     54         with set_random_states(self.random_states, method_name, self.set_random_state):
---> 55             return function(self, *args, **kwargs)
     56 
     57     return wrapper

/usr/local/lib/python3.10/dist-packages/rdt/transformers/base.py in fit(self, data, columns_to_sdtypes)
    567                 Dictionary mapping each column to its sdtype.
    568         """
--> 569         self._validate_columns_to_sdtypes(data, columns_to_sdtypes)
    570         self.columns_to_sdtypes = columns_to_sdtypes
    571         self._store_columns(list(self.columns_to_sdtypes.keys()), data)

/usr/local/lib/python3.10/dist-packages/rdt/transformers/base.py in _validate_columns_to_sdtypes(self, data, columns_to_sdtypes)
    543     def _validate_columns_to_sdtypes(self, data, columns_to_sdtypes):
    544         """Check that all the columns in ``columns_to_sdtypes`` are present in the data."""
--> 545         missing = set(columns_to_sdtypes.keys()) - set(data.columns)
    546         if missing:
    547             missing_to_print = ', '.join(missing)

AttributeError: 'str' object has no attribute 'keys'

Steps to reproduce

To reproduce you can just assign one of these transformers to one column from this dataset:

from rdt.transformers import RandomLocationGenerator, RegionalAnonymizer

ht = HyperTransformer()
ht.detect_initial_config(data)

ht.update_sdtypes(column_name_to_sdtype={
    'country of departure': 'country_code',
    'region of departure': 'administrative_unit',
    'region code of departure': 'state_abbr',
    'city of departure': 'city',
    'postal code of departure': 'postcode',
    'street address of departure': 'street_address',
    'secondary address of departure': 'secondary_address',
    'country of arrival': 'country_code',
    'region of arrival': 'administrative_unit',
    'region code of arrival': 'state_abbr',
    'city of arrival': 'city',
    'postal code of arrival': 'postcode',
    'street address of arrival': 'street_address',
    'secondary address of arrival': 'secondary_address'
})

# only assign a single column (country of departure) to the RandomLocationGenerator
ht.update_transformers(column_name_to_transformer={
    ('country of departure'): RandomLocationGenerator(locales=['en_US', 'es_ES', 'en_GB'], missing_value_generation='random')

})

ht.fit(data)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants