Skip to content

Allow composite keys to be specified in the metadata #2778

@npatki

Description

@npatki

Problem Description

If a schema has composite keys, they currently have to be specified in the CAG bundle. However, composite keys don't entirely make sense as constraints, as they don't really represent business logic. Instead, composite keys are generally defined at a schema-level -- for example, within the database schema or SQL file. In SDV, the equivalent place to specify this would be the metadata.

Expected behavior

The metadata JSON spec should be updated to allow for composite keys. A composite primary key can be inputted using a list rather than a dictionary.

    "tables": {
        "patient_visits": {
            "primary_key": ["patient_id", "date_visited"],
            "columns": {
                "patient_id": { "sdtype": "id", "regex_format": "ID_[0-9]{10}" },
                "date_visited": { "sdtype": "datetime", "datetime_format": "%Y-%m-%d" },

A composite foreign key can be inputted as a list within the relationships section of the metadata.

"relationships": [{
        "parent_table_name": "patient_visits",
        "parent_primary_key": ["patient_id", "date_visited"]
        "child_table_name": "lab_results",
        "child_foreign_key": ["patient_id", "date"]
    }]

Additional context

As part of this change, there are a few different components to address.

Metadata validation: A primary/foreign key can either be a string or a list of strings (in the composite key case). At minimum, we should check the following:

  • In a composite key, at least 1 of the columns should be a column that allows for unlimited generation of values (i.e. an id sdtype, or a PII sdtype). And all of the columns should be present in the "columns" section of the metadata.
  • In a composite primary/foreign key relationship, the length of the foreign key must be the same as the length of the primary key, and each of the sdtypes must match up.

Metadata API: The API should also be updated and enforce the same validation checks as above.

  • In the set_primary_key method, the column_name parameter can be a list of columns for a composite key (in addition to being a string for a singular primary key).
  • In the add_relationship method, the parent_primary_key and child_foreign_key parameters can be a list of column names for a composite key (in addition to being a single string).

Visualization: The visualization should be updated so that composite keys are also shown as connected. For the composite key case, we should show a comma-separated list of columns rather than the singular column name.

Synthesizer: The ability to model and sample schemas with composite keys is an SDV Enterprise feature. So in SDV Community, if you try to instantiate a synthesizer with a metadata that contains composite keys, there should be an error saying that this functionality is not available in SDV Community.

SynthesizerInputError: Your metadata contains composite keys (primary key of table 'patients' has multiple columns). 
Composite keys are not supported in SDV Community.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions