Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve updating field_data_types in HyperTransformer #400

Closed
npatki opened this issue Feb 17, 2022 · 1 comment
Closed

Improve updating field_data_types in HyperTransformer #400

npatki opened this issue Feb 17, 2022 · 1 comment
Assignees
Labels
feature request Request for a new feature
Milestone

Comments

@npatki
Copy link
Contributor

npatki commented Feb 17, 2022

Expected behavior

  • Rename field_data_types to sdtypes in the update methods
  • In the update method, accept a dictionary with param name column_name_to_sdtype
  • Always let the user know that the transformer will change based on the new updated sdtype
# updating the sdtype of a single column
>>> ht.update_sdtypes(column_name_to_sdtype={
       'colA': 'categorical'
 })
Info: The transformers for these columns may change based on the new sdtypes.
Use 'get_config()' to verify the transformers.

# updating the sdtypes of multiple columns
>>> ht.update_sdtypes(column_name_to_sdtype={
       'colA': 'categorical',
       'colB': 'categorical',
       'colC': 'categorical',
    })
Info: The transformers for these columns may change based on the new sdtypes.
Use 'get_config()' to verify the transformers.

Edge cases

Scenario 1: User tries to get or update the types without using auto_detect_config (see #399 )

>>> ht = HyperTransformer()
>>> ht.get_sdtypes()
{}
Tip: Use the `detect_initial_config` method to pre-populate all the sdtypes and transformers from your dataset.

# this will still work but will print out a tip
>>> ht.update_sdtypes(column_name_to_sdtype={
       'colA': 'categorical'
})
Tip: Use the `detect_initial_config` method to pre-populate all the sdtypes and transformers from your dataset.

Scenario 2: User updates the types after calling fit or fit_transform on the data already.

>>> ht.fit(data)
>>> ht.update_sdtypes(column_name_to_sdtype={
    'colA': 'numerical'
})
Info: The transformers for these columns may change based on the new sdtype.
Use 'get_config()' to verify the transformers.
Warning: For this change to take effect, please refit your data using 'fit' or 'fit_transform'.

Scenario 3: User tries to add in a type that is not available on open source

>>> ht.set_sdtypes(column_name_to_sdtype={
        'colA': 'credit_card'
})
Error: Unsupported sdtypes ('credit_card'). To use sdtypes with specific semantic meanings,
please contact the SDV team to update to rdt_plus. Otherwise, use 'pii' to anonymize the column.
@npatki npatki added the feature request Request for a new feature label Feb 17, 2022
@amontanez24 amontanez24 added this to the 1.0.0 milestone Feb 23, 2022
@npatki npatki changed the title Improve getting and updating field_data_types in HyperTransformer Improve updating field_data_types in HyperTransformer Mar 2, 2022
@amontanez24
Copy link
Contributor

For scenario 3, if some of the sdtypes provided are valid but one is invalid, should the valid ones be set? Or should the whole method crash?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Request for a new feature
Projects
None yet
Development

No branches or pull requests

2 participants