Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HyperTransformer should sort columns after transform and reverse_transform #405

Closed
npatki opened this issue Feb 17, 2022 · 0 comments · Fixed by #410
Closed

HyperTransformer should sort columns after transform and reverse_transform #405

npatki opened this issue Feb 17, 2022 · 0 comments · Fixed by #410
Assignees
Labels
feature request Request for a new feature
Milestone

Comments

@npatki
Copy link
Contributor

npatki commented Feb 17, 2022

Problem Description

If the input columns don't match the order of the output columns, then users might think there is some mistake in the transform.

Expected behavior

If the original data has a column order A, B, C, ... then:

  1. The transformed data should have the same column order with respect to the original column positions
    eg. A.value, B.value, B.is_null, C.value, C.is_null, ... (it is still the case that A is before B is before C...)
  2. The reversed transformed data should have the same column order as the original

Additional context

Recreate this issue using the code below

import numpy as np
import pandas as pd

login_dates = ['2021-06-26', '2021-02-10', 'NAT', '2020-09-26', '2020-12-22']

df = pd.DataFrame(data={
    'last_login': [np.datetime64(i) for i in login_dates],
    'email_optin': [False, False, False, True, np.nan],
    'credit_card': ['VISA', 'VISA', 'AMEX', np.nan, 'DISCOVER'],
    'age': [29, 18, 21, 45, 32],
    'dollars_spent': [99.99, np.nan, 2.50, 25.00, 19.99]
})

image

from rdt import HyperTransformer

from rdt.transformers import datetime, boolean, categorical, numerical

field_transformers = {
    'last_login': datetime.DatetimeTransformer(null_column=True),
    'email_optin': boolean.BooleanTransformer(null_column=False, nan=0),
    'credit_card': categorical.CategoricalTransformer(fuzzy=True),
    # 'age': numerical.NumericalTransformer(null_column=False),
    'dollars_spent': numerical.NumericalTransformer(null_column=False, nan=0)
}

ht = HyperTransformer(field_transformers=field_transformers)

ht.fit_transform(df)

image

Notice how age.value is at the very end now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Request for a new feature
Projects
None yet
3 participants