Skip to content

Conversation

@zilto
Copy link
Collaborator

@zilto zilto commented Jul 29, 2025

Added fields on TTableReference that allows users to specify more metadata about the reference.

Why:

  • having table (left) in addition to referenced_table (right) specified directly on the TTableReference simplifies some logic.
  • cardinality and label are just additional metadata, which can also be used in visualizations
  • table would also allow TTableReference to be stored top-level on dlt.Schema (instead of inline on the TTableSchema)

@zilto zilto self-assigned this Jul 29, 2025
@netlify
Copy link

netlify bot commented Jul 29, 2025

Deploy Preview for dlt-hub-docs canceled.

Name Link
🔨 Latest commit 4462de8
🔍 Latest deploy log https://app.netlify.com/projects/dlt-hub-docs/deploys/68890dc389b3fe0008f5bb21

@zilto zilto added the enhancement New feature or request label Jul 29, 2025
@zilto zilto requested review from djudjuu and rudolfix July 29, 2025 18:07
Copy link
Collaborator

@rudolfix rudolfix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

besides review: maybe we should add an example on how to use table references with dlt transformations? in EL pipelines the need for table references is minimal and we are not even validating them. but in T you do more modelling so adding those makes sense. we can validate them in model normalizer...


TReferenceCardinality = Literal["-", "<", ">", "<>"]
"""Represents cardinality between `column` (left) and `referenced_column` (right)
`-`: one-to-one
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is cool notation, but is it commonly used? I've never seen it. It will be to impossible to understand this without referring to docs: both for humans an LLMs. To me text is OK. You can also use UML:
1..1 1..* *..1
or
1:n n:1 1:1

note that many to many requires intermediate join table. that will never happen in our schema or in any physical database diagram.

you can also have 0..* (zero to many) indicating that parent table is allowed to have no children so LEFT JOIN is allowed.

cardinality is pretty deep when you dig into it... tldr;> I'll probably go for one-to-one style and add zero-to-many (and vice versa) if your aim here is to generate correct joins

"""

label: NotRequired[str]
"""Label to describe the relation 'liked'."""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not understand this comment :) is it a custom annotation? you can always add as many x-annotation- as you want.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Example:

  • you have table Users and Posts
  • you have a Reference that specifies User.id < Posts.user_id
  • you have a label="liked" for that Reference, giving the semantic meaning "(User, liked, Post)".

This would be displayed on an edge in a schema diagram. These triplets could also enabled graph database support.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it is called "label" or "edge label" in graph databases. I'd just improve the comment so it is clear what is meant by this.

cardinality: NotRequired[TReferenceCardinality]
"""Cardinality of the relationship between `table.column` (left) and `referenced_table.referenced_column` (right)."""

table: NotRequired[str]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. but now you need to validate this in your dbml PR

Copy link
Collaborator Author

@zilto zilto Aug 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea would be that remove_processing_hints converts all TTableReference to the same form. Either

# inline definition (current)
table_name = "users"
table_schema = schema["tables"][table_name"] 
reference = table_schema["references"][0]

column_name = reference["columns"]
other_table_name = reference["referenced_table"]
other_column_name = reference["referenced_columns"]
# reference.get("table") is None

or

# long form
references = schema["references"]
reference = references[0]

table_name = reference["table"]
column_name = reference["columns"]
other_table_name = reference["referenced_table"]
other_column_name = reference["referenced_columns"]

This is inspired by DBML which has 3 definitions: long form, short form, and inline

@rudolfix rudolfix mentioned this pull request Jul 30, 2025
@zilto
Copy link
Collaborator Author

zilto commented Aug 1, 2025

Converting to draft until I implement the remove_processing_hints described here

@zilto zilto marked this pull request as draft August 1, 2025 16:50
@zilto
Copy link
Collaborator Author

zilto commented Sep 2, 2025

closing because of inactivity; will reopen once actively being developed

@zilto zilto closed this Sep 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants