-
Notifications
You must be signed in to change notification settings - Fork 415
Description
Problem Description
Currently, SDV does not allow schemas where a primary key of a table is also a foreign key into a different table. This is a valid schema pattern, frequently used when there the table contains additional information for the main entity, forming a 1-to-1 relationship. For example, consider a table called Users and another table called Supplemental Info. The Supplemental Info table can have a primary key that is also the foreign key to the Users table. This indicates that each user may be associated with at most 1 supplemental info entry.
Workaround: Right now, SDV will actually correctly model this schema if the primary key designation of the table is left out. For example, if you just denote that there's a foreign key in Supplemental Info without also mentioning that it's a primary key of that table. The synthetic data will be valid because SDV always preserves the cardinality of the table -- in this case, it will preserve the fact that there can be at most 1 supplemental info entry associated with a given user.
Expected behavior
The metadata should allow the case where a primary key of a table is also a foreign key to a different table. Making this change requires updating the following areas:
- Updating the metadata validation to allow for this case
- Allowing this type of schema to be specified via the metadata API. This would mean updating the validation for
set_primary_keyandadd_relationship.
In addition to this we should double-check that the rest of the functionality works as-expected:
- The metadata visualization should correctly show the tables as being connected
- There are no bugs when modeling and sampling this type of data with any of our multi-table synthesizers (HMA, HSA, Independent). The synthetic data should correctly show either an exact 1-to-1 relationship or a 1-to-(0 or 1) relationship (depending on what the real data has).