Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create IDGenerator transformer #680

Merged
merged 7 commits into from
Aug 14, 2023
Merged

Conversation

R-Palazzo
Copy link
Contributor

Resolve #675

@R-Palazzo R-Palazzo requested a review from a team as a code owner August 8, 2023 18:51
@R-Palazzo R-Palazzo removed the request for review from a team August 8, 2023 18:51
@@ -67,7 +67,9 @@ def _validate_helper(validator_function, args, steps):

def _is_valid_transformer(transformer_name):
"""Determine if transformer should be tested or not."""
invalid_names = ['IdentityTransformer', 'Dummy', 'OrderedLabelEncoder', 'CustomLabelEncoder']
invalid_names = [
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if we want this. I think if yes, we need an appropriate dataset with ID columns.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does the AnonymizedFaker get tested? Wouldn't that have similar data?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not here because it has INPUT_SDTYPE='pii'. Following the discussion in the EngMeeting to not support id sdtype on RDT, should I set INPUT_SDTYPE='pii' to the IDGenerator rather than INPUT_SDTYPE='id' (as mentioned in the issue) @amontanez24?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we said text actually. Right @npatki

@codecov-commenter
Copy link

codecov-commenter commented Aug 8, 2023

Codecov Report

Patch coverage: 100.00% and no project coverage change.

Comparison is base (74f20ac) 100.00% compared to head (aa1b931) 100.00%.

❗ Current head aa1b931 differs from pull request most recent head f3638cc. Consider uploading reports for the commit f3638cc to get more accurate results

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

Additional details and impacted files
@@            Coverage Diff            @@
##            master      #680   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           17        17           
  Lines         1660      1684   +24     
=========================================
+ Hits          1660      1684   +24     
Files Changed Coverage Δ
rdt/transformers/__init__.py 100.00% <100.00%> (ø)
rdt/transformers/pii/anonymizer.py 100.00% <100.00%> (ø)
rdt/transformers/text.py 100.00% <100.00%> (ø)

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -67,7 +67,9 @@ def _validate_helper(validator_function, args, steps):

def _is_valid_transformer(transformer_name):
"""Determine if transformer should be tested or not."""
invalid_names = ['IdentityTransformer', 'Dummy', 'OrderedLabelEncoder', 'CustomLabelEncoder']
invalid_names = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does the AnonymizedFaker get tested? Wouldn't that have similar data?

rdt/transformers/id.py Outdated Show resolved Hide resolved
Copy link
Contributor

@frances-h frances-h left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need to set the IS_GENERATOR=True attribute, similar to how we do with AnonymizedFaker

rdt/transformers/id.py Outdated Show resolved Hide resolved
@R-Palazzo
Copy link
Contributor Author

Thanks for your review @frances-h. I addressed the comments in 33d4deb. I'm just not sure which INPUT_SDTYPE we should set for this transformer (id or pii)

Copy link
Contributor

@frances-h frances-h left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last question, but otherwise looking good!

@@ -132,7 +132,9 @@ def get_supported_sdtypes(cls):
list:
Accepted input sdtypes of the transformer.
"""
unsupported_sdtypes = {'numerical', 'datetime', 'categorical', 'boolean', 'text', None}
unsupported_sdtypes = {
'numerical', 'datetime', 'categorical', 'boolean', 'text', None, 'id'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to add id here? My understanding from the discussion was that RDT doesn't support the id sdtype.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes good catch thank you, I forgot about it, done in f3638cc

Copy link
Contributor

@frances-h frances-h left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing!

Copy link
Contributor

@amontanez24 amontanez24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@R-Palazzo R-Palazzo merged commit 07d095d into master Aug 14, 2023
46 checks passed
@R-Palazzo R-Palazzo deleted the issue-675-idgenerator-transformer branch August 14, 2023 08:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create IDGenerator transformer
4 participants