-
Notifications
You must be signed in to change notification settings - Fork 415
Description
Problem Description
If a schema has composite keys, they currently have to be specified in the CAG bundle. However, composite keys don't entirely make sense as constraints, as they don't really represent business logic. Instead, composite keys are generally defined at a schema-level -- for example, within the database schema or SQL file. In SDV, the equivalent place to specify this would be the metadata.
Expected behavior
The metadata JSON spec should be updated to allow for composite keys. A composite primary key can be inputted using a list rather than a dictionary.
"tables": {
"patient_visits": {
"primary_key": ["patient_id", "date_visited"],
"columns": {
"patient_id": { "sdtype": "id", "regex_format": "ID_[0-9]{10}" },
"date_visited": { "sdtype": "datetime", "datetime_format": "%Y-%m-%d" },A composite foreign key can be inputted as a list within the relationships section of the metadata.
"relationships": [{
"parent_table_name": "patient_visits",
"parent_primary_key": ["patient_id", "date_visited"]
"child_table_name": "lab_results",
"child_foreign_key": ["patient_id", "date"]
}]Additional context
As part of this change, there are a few different components to address.
Metadata validation: A primary/foreign key can either be a string or a list of strings (in the composite key case). At minimum, we should check the following:
- In a composite key, at least 1 of the columns should be a column that allows for unlimited generation of values (i.e. an
idsdtype, or a PII sdtype). And all of the columns should be present in the "columns" section of the metadata. - In a composite primary/foreign key relationship, the length of the foreign key must be the same as the length of the primary key, and each of the sdtypes must match up.
Metadata API: The API should also be updated and enforce the same validation checks as above.
- In the
set_primary_keymethod, thecolumn_nameparameter can be a list of columns for a composite key (in addition to being a string for a singular primary key). - In the
add_relationshipmethod, theparent_primary_keyandchild_foreign_keyparameters can be a list of column names for a composite key (in addition to being a single string).
Visualization: The visualization should be updated so that composite keys are also shown as connected. For the composite key case, we should show a comma-separated list of columns rather than the singular column name.
Synthesizer: The ability to model and sample schemas with composite keys is an SDV Enterprise feature. So in SDV Community, if you try to instantiate a synthesizer with a metadata that contains composite keys, there should be an error saying that this functionality is not available in SDV Community.
SynthesizerInputError: Your metadata contains composite keys (primary key of table 'patients' has multiple columns).
Composite keys are not supported in SDV Community.