-
Notifications
You must be signed in to change notification settings - Fork 317
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create Unique Constraint #532
Comments
This sounds good @amontanez24 but I would suggest one change in the proposal: Instead of having a |
It should also be possible for Eg. a |
If from sdv.constraints import Unique, UniqueCombinations
constraints = [
Unique(columns=['country', 'city']),
UniqueCombinations(columns=['country', 'city']),
]
my_model = MyTabularModel(constraints=constraints) |
Could we consider having a from sdv.constraints import Unique
cons = Unique(
columns=['country', 'city'], # a country, city pair can only appear once
extra_combinations=False # don't make extra combinations outside of the original data
)
my_model = MyTabularModel(constraints=[cons])
|
I would not do it, for multiple reasons:
So, altogether, I would rather say that we should cover this in a separate issue about making the constraints robust enough (or have enough validations) so that incompatible ordering is not possible, either because using the wrong order raises a helpful error message from which the user can learn how to sort the constraints properly, or because SDV is able to internally figure out the right order and apply it. |
+1 to a new issue about making sure SDV can robustly handle multiple constraints. This seems like a future project. For the current scope -- a descriptive error message would be nice. Food for thought:
This isn't 100% true right now. For eg, the I actually like this setup. To me, it's simple to have 1 constraint per problem being encountered. Together, the constraint need not necessarily be mutually exclusive primitives. |
I see that @amontanez24 already opened issue #541 to sort out the problem of combining multiple constraints that affect the same columns, so here we can stick to the single Also, it seems like the |
Problem Description
Sometimes, even though a column isn't a key, the values in it need to be unique in order for the data to be valid. Consider the following 2 use cases:
user_id
), I need to makes sure thatSSN
is also uniquemodels
(eg. Nike, Reebok, etc.) andsizes
(6, 6.5, 7, etc.). I to make sure that each size is unique within a given model. Eg. I can't have 2 rows that are size 7.5 ReebokIn order to support this, write a
Unique
constraint where the user can supply both acolumn
(whose values must be unique) and an optional list ofgroup_by
columns that control the partition that determine uniqueness. In our casescolumn='SSN'
withgroup_by=None
-- which means it's unique throughout the tablecolumn='size'
whilegroup_by=['model']
-- which means that within a given model, size is uniqueThe text was updated successfully, but these errors were encountered: