Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 30 additions & 1 deletion dlt/common/schema/typing.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
NewType,
Union,
)
from typing_extensions import Never
from typing_extensions import Never, NotRequired

from dlt.common.data_types import TDataType
from dlt.common.normalizers.typing import TNormalizersConfig
Expand Down Expand Up @@ -276,14 +276,43 @@ class TScd2StrategyDict(TMergeDispositionDict, total=False):
]


TReferenceCardinality = Literal["-", "<", ">", "<>"]
"""Represents cardinality between `column` (left) and `referenced_column` (right)
`-`: one-to-one
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is cool notation, but is it commonly used? I've never seen it. It will be to impossible to understand this without referring to docs: both for humans an LLMs. To me text is OK. You can also use UML:
1..1 1..* *..1
or
1:n n:1 1:1

note that many to many requires intermediate join table. that will never happen in our schema or in any physical database diagram.

you can also have 0..* (zero to many) indicating that parent table is allowed to have no children so LEFT JOIN is allowed.

cardinality is pretty deep when you dig into it... tldr;> I'll probably go for one-to-one style and add zero-to-many (and vice versa) if your aim here is to generate correct joins

`<`: one-to-many
`>`: many-to-one
`<>`: many-to-many

Note that `column <> referenced_column` is equivalent to specifying
both `column < referenced_column` and `column > referenced_column`
"""


class TTableReference(TypedDict):
"""Describes a reference to another table's columns.
`columns` corresponds to the `referenced_columns` in the referenced table and their order should match.
"""

label: NotRequired[str]
"""Label to describe the relation 'liked'."""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not understand this comment :) is it a custom annotation? you can always add as many x-annotation- as you want.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Example:

  • you have table Users and Posts
  • you have a Reference that specifies User.id < Posts.user_id
  • you have a label="liked" for that Reference, giving the semantic meaning "(User, liked, Post)".

This would be displayed on an edge in a schema diagram. These triplets could also enabled graph database support.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it is called "label" or "edge label" in graph databases. I'd just improve the comment so it is clear what is meant by this.


cardinality: NotRequired[TReferenceCardinality]
"""Cardinality of the relationship between `table.column` (left) and `referenced_table.referenced_column` (right)."""

table: NotRequired[str]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. but now you need to validate this in your dbml PR

Copy link
Collaborator Author

@zilto zilto Aug 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea would be that remove_processing_hints converts all TTableReference to the same form. Either

# inline definition (current)
table_name = "users"
table_schema = schema["tables"][table_name"] 
reference = table_schema["references"][0]

column_name = reference["columns"]
other_table_name = reference["referenced_table"]
other_column_name = reference["referenced_columns"]
# reference.get("table") is None

or

# long form
references = schema["references"]
reference = references[0]

table_name = reference["table"]
column_name = reference["columns"]
other_table_name = reference["referenced_table"]
other_column_name = reference["referenced_columns"]

This is inspired by DBML which has 3 definitions: long form, short form, and inline

"""Name of the table.
When `TTableReference` is defined on a `TTableSchema` (i.e., "inline reference"), the `table`
value is determined by `TTableSchema["name"]`
"""

columns: Sequence[str]
"""Name of the column(s) from `table`"""

referenced_table: str
"""Name of the referenced table"""

referenced_columns: Sequence[str]
"""Name of the columns(s) from `referenced_table`"""


TTableReferenceParam = Sequence[TTableReference]
Expand Down
Loading