Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Input validation & error handling in HyperTransformer #408

Closed
npatki opened this issue Feb 18, 2022 · 0 comments
Closed

Input validation & error handling in HyperTransformer #408

npatki opened this issue Feb 18, 2022 · 0 comments
Assignees
Labels
feature request Request for a new feature
Milestone

Comments

@npatki
Copy link
Contributor

npatki commented Feb 18, 2022

Expected behavior

We should check input and provide friendly tips, warnings and errors to guide the user into correct HyperTransformer usage.

Scenario 1: User calls fit or fit_transform directly without using the new auto_detect_config method (see #399)

>>> ht = HyperTransformer()
>>> ht.fit(data)
Error: No config detected. Set the config using `set_config` or pre-populate it automatically from your data
using `detect_initial_config` prior to fitting your data.

Scenario 2: User tries to transform a dataset that does not have the same column names or sdtypes as the fit data.

>>> ht.transform(different_data)
Error: The data you are trying to transform has different columns than the original data. Column names and their sdtypes
must be the same. Use the method 'get_config()' to see the expected values. 

Note: This should trigger any time that the set of columns is different. If the set is the same -- but the columns are just in a different order -- then we should proceed with the transformation.

Scenario 3: User tries to fit a dataset that does not have the same columns as the config

>>> ht.fit(different_data)
Error: The data you are trying to transform has different columns than the original data. Column names and their sdtypes
must be the same. Use the method 'get_config()' to see the expected values. 

Note: This should trigger any time that the set of columns is different. If the set is the same -- but the columns are just in a different order -- then we should proceed with the fit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Request for a new feature
Projects
None yet
Development

No branches or pull requests

3 participants