Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle null values in speed preset #737

Merged
merged 4 commits into from
Mar 23, 2022

Conversation

katxiao
Copy link
Contributor

@katxiao katxiao commented Mar 17, 2022

In the speed preset, we want to avoid modeling nulls in the GaussianCopula model. Instead, we capture the null value percentage of each column with null values before fitting, and add nulls back with the same likelihood after sampling.

Resolves #716

@katxiao katxiao requested a review from a team as a code owner March 17, 2022 22:25
@katxiao katxiao requested review from amontanez24 and removed request for a team March 17, 2022 22:25
Copy link
Contributor

@amontanez24 amontanez24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking really good! I just left one comment about potentially making it faster

sdv/lite/tabular.py Outdated Show resolved Hide resolved
Copy link
Contributor

@amontanez24 amontanez24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for addressing the changes

@katxiao katxiao merged commit 4232f12 into master Mar 23, 2022
@katxiao katxiao deleted the issue-716-null-values-speed-preset branch March 23, 2022 15:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create a speed optimized Preset
2 participants