Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add create_anonymized_columns method to anonymize data from scratch #546

Closed
npatki opened this issue Aug 30, 2022 · 0 comments · Fixed by #553
Closed

Add create_anonymized_columns method to anonymize data from scratch #546

npatki opened this issue Aug 30, 2022 · 0 comments · Fixed by #553
Assignees
Labels
feature request Request for a new feature
Milestone

Comments

@npatki
Copy link
Contributor

npatki commented Aug 30, 2022

Problem Description

There are some transformers such as AnonymizedFaker and RegexGenerator that drop columns on the forward transform. It isn't possible to reverse transform only these columns because there is nothing to input.

To allow for this case, we should add a new method: create_anonymized_columns

Expected behavior

This method should belong to the HyperTransformer class and should be used only after fitting the data.

Parameters:

  • (required) num_rows: An integer > 0 that describes the number of rows to anonymize
  • (required) column_names: The list of column names to anonymize
ht.fit(data)
ht.create_anonymized_columns(num_rows=10, column_names=['credit_card', 'user_id'])

Error Cases

HyperTransformer has not been fit yet. The HyperTransformer must be fit in order to use this method.

ht.create_anonymized_columns(num_rows=10, column_names=['credit_card', 'user_id'])
The HyperTransformer is not ready to use. Please fit your data first using 'fit' or 'fit_transform'.

Required parameters: num_rows and column_names. Throw an error if these aren't present or if there are extra, unknown parameters.

ht.create_anonymized_columns(num_rows=10)
Error: Missing required parameter 'column_names'

ht.create_anonymized_columns(num_rows=10, column_names=['user_id'], test_parameter=4)
Error: Unknown parameter 'test_parameter'

Parameter num_rows must be an integer >0.

ht.create_anonymized_columns(num_rows=-6, column_names=['user_id'])
Error: Parameter 'num_rows' must be an integer greater than 0.

Parameter column_names must describe valid columns that are in the config and they must be assigned to AnonymizedFaker or RegexGenerator only.

ht.create_anonymized_columns(num_rows=10, column_names=['invalid_name', 'credit_card'])
Error: Unknown column name 'invalid_name'. Use 'get_config()' to see a list of valid column names.

ht.create_anonymized_columns(num_rows=10, column_names=['age', 'credit_card'])
Error: Column 'age' cannot be anonymized. All columns must be assigned to 'AnonymizedFaker' or 'RegexGenerator'.
Use 'get_config()' to see the current transformer assignments.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Request for a new feature
Projects
None yet
2 participants