Skip to content

Allow a primary key to also be a foreign key (in the metadata and also for modeling) #2779

@npatki

Description

@npatki

Problem Description

Currently, SDV does not allow schemas where a primary key of a table is also a foreign key into a different table. This is a valid schema pattern, frequently used when there the table contains additional information for the main entity, forming a 1-to-1 relationship. For example, consider a table called Users and another table called Supplemental Info. The Supplemental Info table can have a primary key that is also the foreign key to the Users table. This indicates that each user may be associated with at most 1 supplemental info entry.

Workaround: Right now, SDV will actually correctly model this schema if the primary key designation of the table is left out. For example, if you just denote that there's a foreign key in Supplemental Info without also mentioning that it's a primary key of that table. The synthetic data will be valid because SDV always preserves the cardinality of the table -- in this case, it will preserve the fact that there can be at most 1 supplemental info entry associated with a given user.

Expected behavior

The metadata should allow the case where a primary key of a table is also a foreign key to a different table. Making this change requires updating the following areas:

  1. Updating the metadata validation to allow for this case
  2. Allowing this type of schema to be specified via the metadata API. This would mean updating the validation for set_primary_key and add_relationship.

In addition to this we should double-check that the rest of the functionality works as-expected:

  • The metadata visualization should correctly show the tables as being connected
  • There are no bugs when modeling and sampling this type of data with any of our multi-table synthesizers (HMA, HSA, Independent). The synthetic data should correctly show either an exact 1-to-1 relationship or a 1-to-(0 or 1) relationship (depending on what the real data has).

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions