Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The OrderedLabelEncoder should not accept duplicate categories #673

Closed
npatki opened this issue Jul 18, 2023 · 0 comments · Fixed by #718
Closed

The OrderedLabelEncoder should not accept duplicate categories #673

npatki opened this issue Jul 18, 2023 · 0 comments · Fixed by #718
Labels
bug Something isn't working
Milestone

Comments

@npatki
Copy link
Contributor

npatki commented Jul 18, 2023

Environment Details

  • RDT version: 1.6.0 (latest)
  • Python version: 3.10
  • Operating System: Linux (Google Colab)

Error Description

The OrderedLabelEncoder accepts a parameter where I input the order of the categories. The implicit requirement is that all possible categories are present only once in this parameter.

If I accidentally provide duplicate category labels, it doesn't make sense. I expect the instantiation to crash (but in reality, it somehow works).

Steps to reproduce

from rdt.transformers.categorical import OrderedLabelEncoder
from sdv.metadata import SingleTableMetadata
from sdv.single_table import GaussianCopulaSynthesizer
import pandas as pd

metadata = SingleTableMetadata.load_from_dict({
    'columns': {
        'a': { 'sdtype': 'categorical' },
        'b': { 'sdtype': 'categorical' }
    }
})

data = pd.DataFrame(data={
    'a': ['A', 'B', 'C']*2,
    'b': [1, 0.5, None]*2
})

synthesizer = GaussianCopulaSynthesizer(metadata)
synthesizer.auto_assign_transformers(data)

synthesizer.update_transformers({
    'a': UniformEncoder(),
    'b': OrderedLabelEncoder(order=[1, 0.5, None, 1]) # provide duplicate categories
})

synthesizer.fit(data)
synthesizer.sample(10)

Expected

I expect an error when instantiating OrderedLabelEncoder with a descriptive message.

TransformerInputError: The OrderedLabelEncoder has duplicate categories in the 'order' parameter. Please drop the duplicates to proceed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants