Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rounding scheme cannot be turned off sometimes #1040

Closed
npatki opened this issue Sep 28, 2022 · 3 comments
Closed

Rounding scheme cannot be turned off sometimes #1040

npatki opened this issue Sep 28, 2022 · 3 comments
Labels
bug Something isn't working data:single-table Related to tabular datasets resolution:WAI The software is working as intended

Comments

@npatki
Copy link
Contributor

npatki commented Sep 28, 2022

Environment Details

  • SDV version: 0.17.0
  • Python version: 3.7
  • Operating System: Linux

Error Description

The learn_rounding_scheme parameter is available for all single table models. By default it is True.

When I specify the learn_rounding_scheme=False, the rounding scheme still gets enforced sometimes. In particular, it seems to happen when the original data has >14 digits.

Steps to reproduce

from sdv.tabular import GaussianCopula
import pandas as pd
import numpy as np

test_data = pd.DataFrame(data={
    'column': [0.123456789012345]*10
})

model = GaussianCopula(
    learn_rounding_scheme=False)

model.fit(test_data.round(4))
model.sample(num_rows=5)

All outputs have 4 digits even though I explicitly turned the rounding scheme off. Note that this is probably related to #1039

@npatki npatki added bug Something isn't working data:single-table Related to tabular datasets labels Sep 28, 2022
@sharisiri
Copy link

Hi there, I'm trying to run the FAST_ML preset in the tabular model tutorial and I'm getting the same error:

__init__() got an unexpected keyword argument 'learn_rounding_scheme'

Is it possible to pass in the enforce_rounding_scheme=False flag somewhere when using the FAST_ML preset?

Btw, huge fan of the project! Thanks for your all your efforts.

@npatki
Copy link
Contributor Author

npatki commented Nov 17, 2022

Hi @sharisiri, great to hear!

FAST_ML is designed to be lightweight model with no additional arguments (they are all preset). If you'd like to change the flags to customize your model, I'd recommend using the GaussianCopula model instead.

Please note that the parameter is called learn_rounding_scheme (not enforce_rounding_scheme).

from sdv.tabular import GaussianCopula

model = GaussianCopula(learn_rounding_scheme=False)
model.fit(data)
synthetic_data = model.sample(num_rows=100)

@npatki
Copy link
Contributor Author

npatki commented Dec 1, 2022

Closing this off. The initial example only had a single, constant value. In this case, no rounding scheme is used -- the synthetic data just recreates the same value.

If you try the same example with multiple values, then you can see that the rounding scheme can be turned off.

from sdv.tabular import GaussianCopula
import pandas as pd
import numpy as np

test_data = pd.DataFrame(data={
    'column': [0.123456789012345]*5 + [1.123456789012345]*5
})

model = GaussianCopula(
    learn_rounding_scheme=False)

model.fit(test_data.round(4))
model.sample(num_rows=5)

@npatki npatki closed this as completed Dec 1, 2022
@npatki npatki added the resolution:WAI The software is working as intended label Dec 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working data:single-table Related to tabular datasets resolution:WAI The software is working as intended
Projects
None yet
Development

No branches or pull requests

2 participants