Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GreaterThan constraint between Date columns raises TypeError #421

Closed
MLjungg opened this issue Apr 30, 2021 · 2 comments
Closed

GreaterThan constraint between Date columns raises TypeError #421

MLjungg opened this issue Apr 30, 2021 · 2 comments
Assignees
Labels
bug Something isn't working
Milestone

Comments

@MLjungg
Copy link

MLjungg commented Apr 30, 2021

Environment Details

Please indicate the following details about the environment in which you found the bug:

  • SDV version: 0.9.1
  • Python version: 3.8
  • Operating System: OSX

Error Description

I'm trying to define a greaterThan constraint with CopulaGAN using the handling_strategy="transform" between dates. This causes the following error:

"TypeError: ufunc 'exp' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''"

Steps to reproduce

The error can be reproduced with the following code:

from sdv.tabular import CopulaGAN
from sdv.constraints import GreaterThan
from sdv.demo import load_tabular_demo

data = load_tabular_demo('student_placements')
data.head()

date_constraint = GreaterThan(
    low='start_date',
    high='end_date',
    handling_strategy='transform'
)

constraints = [
    date_constraint,
]

model = CopulaGAN(epochs=50, constraints=constraints)
model.fit(data)
model.sample(100)

Reason for error

The problem occurs because np.exp() is computed on a DateType64[ns] as the column is reversed_transformed:

    def reverse_transform(self, table_data):
        """Reverse transform the table data.

        The transformation is reversed by computing an exponential of the given
        value, converting it to the original dtype, subtracting 1 and finally
        clipping the value to 0 on the low end to ensure the value is positive.

        Finally, the obtained value is added to the ``low`` column to get the final
        ``high`` value.

        Args:
            table_data (pandas.DataFrame):
                Table data.

        Returns:
            pandas.DataFrame:
                Transformed data.
        """
        table_data = table_data.copy()
        diff = (np.exp(table_data[self._high]).round() - 1).clip(0)
        low_column = table_data[self._low]

        if pd.api.types.is_datetime64_ns_dtype(low_column):
            diff = pd.to_timedelta(diff)

        table_data[self._high] = (low_column + diff).astype(self._dtype)

        return table_data
@MLjungg MLjungg added bug Something isn't working pending review labels Apr 30, 2021
@npatki
Copy link
Contributor

npatki commented May 19, 2021

Hi! Thanks for catching this. I'm able to reproduce and it is, indeed a bug. We'll update when we have a fix.

In the meantime, I've had success using the reject_sampling strategy.

date_constraint = GreaterThan(
    low='start_date',
    high='end_date',
    handling_strategy='reject_sampling'
)

@npatki
Copy link
Contributor

npatki commented Jul 7, 2021

Resolved by #476

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants