Skip to content

Allow a column to be both a primary key and foreign key at the same time #2455

@npatki

Description

@npatki

Problem Description

In certain cases, it is possible to have a schema in which a column is both (a) a primary key of the table, and (b) a foreign key linking to a different table at the same time. This generally happens when there is a "0 or 1" linkage between two tables.

For example, consider a table Users that contains all users of a service:

User ID Date Joined Subscription Type
id_03234 June 2, 2024 Normal
id_93813 Apr 4, 2023 Normal
id_9192 Dec 7, 2024 Premium
... ... ...

Then consider another table Survey Responses that contains optional survey responses submitted by those users. Not all users filled out the survey, and the ones who did could only submit it 1 time.

User ID Age Satisfaction
id_03234 43 Very Satisfied
id_9192 23 Slightly Satisfied

In this sense, the Users the Survey Responses have a "0 or 1" mapping.

In this schema, the Survey Responses.User ID column is special:

  • It is the primary key of the Survey Responses table, as it is uniquely identifying each response
  • It is also the foreign key into Users.User ID because it is referring to the same concept of a user

However, the metadata definition won't currently allow me to specify the column as both a primary and foreign key. The metadata validation fails:

InvalidMetadataError: The metadata is not valid
Relationships:
Invalid relationship between table 'Users' and table 'Survey Responses'. A relationship must connect a primary key with a non-primary key.

Expected behavior

I expect that I should be able to add such a relationship to my metadata and that the synthesizers would create valid synthetic data ensuring that:

  • The values in the column are all unique (because it's a primary key) and
  • The values in the column are referencing values in the parent table's primary key too (because it's a foreign key)

Workaround

The simple workaround for anyone running into this is: Only supply the foreign key designation in the metadata for now (and leave out the primary key designation).

In this example: Do not designate that Survey Responses.User ID is a primary key, but do designate it as a foreign key into Users.User ID.

This will create fully valid synthetic data that contains fully unique values for Survey Responses.User ID. Even though it's not designated as a primary key, all SDV synthesizers respect the cardinality logic so the synthetic data will be valid.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions