-
Notifications
You must be signed in to change notification settings - Fork 356
feat: add label, table, and cardinality fields to TTableReference
#2941
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
✅ Deploy Preview for dlt-hub-docs canceled.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
besides review: maybe we should add an example on how to use table references with dlt transformations? in EL pipelines the need for table references is minimal and we are not even validating them. but in T you do more modelling so adding those makes sense. we can validate them in model normalizer...
|
|
||
| TReferenceCardinality = Literal["-", "<", ">", "<>"] | ||
| """Represents cardinality between `column` (left) and `referenced_column` (right) | ||
| `-`: one-to-one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is cool notation, but is it commonly used? I've never seen it. It will be to impossible to understand this without referring to docs: both for humans an LLMs. To me text is OK. You can also use UML:
1..1 1..* *..1
or
1:n n:1 1:1
note that many to many requires intermediate join table. that will never happen in our schema or in any physical database diagram.
you can also have 0..* (zero to many) indicating that parent table is allowed to have no children so LEFT JOIN is allowed.
cardinality is pretty deep when you dig into it... tldr;> I'll probably go for one-to-one style and add zero-to-many (and vice versa) if your aim here is to generate correct joins
| """ | ||
|
|
||
| label: NotRequired[str] | ||
| """Label to describe the relation 'liked'.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not understand this comment :) is it a custom annotation? you can always add as many x-annotation- as you want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Example:
- you have table
UsersandPosts - you have a
Referencethat specifiesUser.id < Posts.user_id - you have a
label="liked"for thatReference, giving the semantic meaning "(User, liked, Post)".
This would be displayed on an edge in a schema diagram. These triplets could also enabled graph database support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it is called "label" or "edge label" in graph databases. I'd just improve the comment so it is clear what is meant by this.
| cardinality: NotRequired[TReferenceCardinality] | ||
| """Cardinality of the relationship between `table.column` (left) and `referenced_table.referenced_column` (right).""" | ||
|
|
||
| table: NotRequired[str] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. but now you need to validate this in your dbml PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea would be that remove_processing_hints converts all TTableReference to the same form. Either
# inline definition (current)
table_name = "users"
table_schema = schema["tables"][table_name"]
reference = table_schema["references"][0]
column_name = reference["columns"]
other_table_name = reference["referenced_table"]
other_column_name = reference["referenced_columns"]
# reference.get("table") is Noneor
# long form
references = schema["references"]
reference = references[0]
table_name = reference["table"]
column_name = reference["columns"]
other_table_name = reference["referenced_table"]
other_column_name = reference["referenced_columns"]This is inspired by DBML which has 3 definitions: long form, short form, and inline
|
Converting to draft until I implement the |
|
closing because of inactivity; will reopen once actively being developed |
Added fields on
TTableReferencethat allows users to specify more metadata about the reference.Why:
table(left) in addition toreferenced_table(right) specified directly on theTTableReferencesimplifies some logic.cardinalityandlabelare just additional metadata, which can also be used in visualizationstablewould also allowTTableReferenceto be stored top-level ondlt.Schema(instead of inline on theTTableSchema)