Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support dropping a column trough a transformer #393

Closed
pvk-developer opened this issue Feb 15, 2022 · 0 comments · Fixed by #396
Closed

Support dropping a column trough a transformer #393

pvk-developer opened this issue Feb 15, 2022 · 0 comments · Fixed by #396
Assignees
Labels
feature request Request for a new feature
Milestone

Comments

@pvk-developer
Copy link
Member

pvk-developer commented Feb 15, 2022

Problem Description

Currently our transformers are expected to always return a transformed column, but in some cases we would like to learn something from the column, drop it and then populate it with values during the reverse_transform.

Expected behavior

class PIITransformer(BaseTransformer):
    OUTPUT_TYPES = {
        'pii': 'categorical'
    }
    def __init__(self):
        self.length = None
        self.faker = Faker()
    
    def _fit(self, columns_data):
        self.length = len(columns_data)
        
    def _transform(self, columns_data):
        return None  # Drop the column
    
    def _reverse_transform(self, columns_data):
        return [self.faker.name() for i in range(self.length)]  # Generate the values

Which should lead to the following outcomes:

data = pd.DataFrame({
    'name': ['John', 'Doe', 'John Doe', 'John Doe John'],
    'donation': [20, 30, 40, 50]
})

pii_transformer = PIITransformer()
transformed = pii_transformer.fit_transform(data, 'name')
transformed

   donation
0        20
1        30
2        40
3        50

Which would contain only the donation value (the name has been dropped).

When reverse_transforme:

data = pii_transformer.reverse_transform(transformed)
data

   donation              name
0        20    Gregory Rivera
1        30  Melissa Peterson
2        40   Mrs. Lori Pitts
3        50       Susan Payne

A column name has been returned with random values, the data contains both name and donation columns.

@pvk-developer pvk-developer added feature request Request for a new feature pending review labels Feb 15, 2022
@pvk-developer pvk-developer changed the title Support dropping a column Support dropping a column trough a transformer Feb 15, 2022
@amontanez24 amontanez24 modified the milestone: 0.5.4 Feb 16, 2022
@amontanez24 amontanez24 added this to the 0.6.4 milestone Mar 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Request for a new feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants