Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User validation for update_transformers #475

Closed
npatki opened this issue Apr 6, 2022 · 0 comments
Closed

User validation for update_transformers #475

npatki opened this issue Apr 6, 2022 · 0 comments
Assignees
Labels
feature request Request for a new feature
Milestone

Comments

@npatki
Copy link
Contributor

npatki commented Apr 6, 2022

Problem Description

When a user uses update_transformers, we should validate their input & throw appropriate warnings/errors.

Expected behavior

Perform the following 4 validations when using this function.

1. Users attempts to use update_transformers after using fit

Throw a warning, same as #466

ht = HyperTransformer()
ht.detect_initial_config(data)
ht.fit()

ht.update_transformers(column_name_to_transformer={
  'column_A': FrequencyEncoder()
})
Warning: For this change to take effect, please refit your data using 'fit' or 'fit_transform'.

2. Invalid column names

The column names must be in the config.

ht.update_transformers(column_name_to_transformer={
  'unknown_column_1': FrequencyEncoder(),
  'unknown_column_2': LabelEncoder()
})
Error: Invalid column names: ['unknown_column_1', 'uknown_column_2']. These columns do not exist in the config. Use 'set_config' to write and set your entire config at once.

3. Invalid transformers

The user must provide either a transformer object or None.

ht.update_transformers(column_name_to_transformer={
  'column_A': "FrequencyEncoder()",
  'column_B': 4.5
})
Error: Invalid transformers for columns: ['column_A', 'column_B']. Please assign an rdt transformer object to each column name.

4. Transformers are not compatible with the sdtypes

Check the sdtypes and throw a warning in case of incompatibility.

ht.update_transformers(column_name_to_transformer={
  'datetime_column': FrequencyEncoder(),
  'numerical_column': BinaryEncoder()
})
Warning: Some transformers you've assigned are not compatible with the sdtypes. Use 'update_sdtypes' to update: ['datetime_column', 'numerical_column']
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Request for a new feature
Projects
None yet
Development

No branches or pull requests

3 participants