-
Notifications
You must be signed in to change notification settings - Fork 315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrectly enforced rounding on numerical/float data columns #1039
Comments
Hi @dionman thanks so much for filing this and providing the data/metadata. Confirmed that I can replicate. The issue appears to be in how we're learning & enforcing rounding. Only the wrappers in From some of my own experiments, I've found the following:
WorkaroundIf it doesn't affect quality too much, I'd suggest just rounding your data to the first 14 digits before training the model. rounded_data = data.round(14)
model.fit(rounded_data) FixesThe RDT library should be fixed, we actually plan to stop using RDT for rounding in future SDV releases. We should continue to keep this issue open until we verify a fix. Separately, I'm not sure why |
Let's keep this one open until we fix & verify the underlying bug all the way throughout the SDV. |
Hi everyone, great news! This issue has now been resolved in the new, SDV 1.0 (Beta!) release. Fore more information and to get started, see the SDV 1.0 demos. |
see sdv-dev/SDV#1039, fixed in sdv-1.0beta.
Error Description
I’m fitting CT-GAN and TVAE on the attached halfmoon dataset (using the default parameters). When sampling from the model I get the float variables discretised to the closest integer. Have you observed behaviour like this in any other settings? It seems to me it’s probably due to some transformation. (I’m using the
fit_sample()
function as defined in theSDGym
repo to fit the model and sample from it). On the other hand, I can get sane output if I instead usefit()
andsample()
as implemented inctgan.synthesizers.ctgan.CTGANSynthesizer()
halfmoon.zip
The text was updated successfully, but these errors were encountered: