You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It occurred to me that if we follow some simple naming rules for join keys, we can substantially improve usability and data validation.. cc @janowicz@mxndrwgrdnr
This idea is related to issue #67 in that it's also about column names, but they're pretty separate.
Rules
Each table has a primary key/ index of one or more columns (already true)
Foreign keys have the same name as the primary key they're associated with (already true 95% of the time)
Columns cannot have the same name as another table's primary key unless they're meant to be associated with it (hopefully already true)
Advantages
If we follow these rules, we don't need "broadcasts". Join relationships are known in advance from the column names. This is easier for users and avoids bugs associated with bad broadcast definitions.
It also allows us to validate table relationships at any time. I've been reluctant to validate broadcasts this way, because sometimes they're provided in advance but not meant to be used until later in a simulation when source tables are present.
Tricky cases
Should work fine for multi-column keys, which is a nice bonus because Orca broadcasts don't support them. (ChoiceModels implements interaction term merges this way.)
Sometimes tables have the same primary key as each other, one with a subset of the id's (e.g. master list of nodes and a smaller list representing a transit network). I don't see any problems supporting this as long as we're expecting it.
I only see one place in the current cloud platform data spec that violates these rules: building parcel_id maps to parcel primary_id.
Implementation
It would be helpful to implement support for auto-specified merges at the same time as the data loading (issue #66). Two possible approaches:
a. Templates automatically generate Orca broadcasts? I suspect this would be tricky, because Orca doesn't allow over-determined broadcasts. (If a is linked to b and c, and b is also linked to c, you can't orca-merge the three of them. Not sure if this is a bug or intentional.)
b. Templates first try Orca merge, and if the broadcasts aren't there it falls back to its own merge logic. Once it's working smoothly we can add it to Orca.
Diagram
The text was updated successfully, but these errors were encountered:
It occurred to me that if we follow some simple naming rules for join keys, we can substantially improve usability and data validation.. cc @janowicz @mxndrwgrdnr
This idea is related to issue #67 in that it's also about column names, but they're pretty separate.
Rules
Advantages
If we follow these rules, we don't need "broadcasts". Join relationships are known in advance from the column names. This is easier for users and avoids bugs associated with bad broadcast definitions.
It also allows us to validate table relationships at any time. I've been reluctant to validate broadcasts this way, because sometimes they're provided in advance but not meant to be used until later in a simulation when source tables are present.
Tricky cases
Should work fine for multi-column keys, which is a nice bonus because Orca broadcasts don't support them. (ChoiceModels implements interaction term merges this way.)
Sometimes tables have the same primary key as each other, one with a subset of the id's (e.g. master list of nodes and a smaller list representing a transit network). I don't see any problems supporting this as long as we're expecting it.
I only see one place in the current cloud platform data spec that violates these rules: building
parcel_id
maps to parcelprimary_id
.Implementation
It would be helpful to implement support for auto-specified merges at the same time as the data loading (issue #66). Two possible approaches:
a. Templates automatically generate Orca broadcasts? I suspect this would be tricky, because Orca doesn't allow over-determined broadcasts. (If a is linked to b and c, and b is also linked to c, you can't orca-merge the three of them. Not sure if this is a bug or intentional.)
b. Templates first try Orca merge, and if the broadcasts aren't there it falls back to its own merge logic. Once it's working smoothly we can add it to Orca.
Diagram
The text was updated successfully, but these errors were encountered: