You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We had a very interesting conversation with @danielballan yesterday about the structure of the catalog.db database, specifically the representation of the hierarchy of nodes.
The rows of the nodes table store metadata of assets that often (always?) are in a hierarchical relationship with each other (e.g. a tree of folders). Currently, this structure is encoded by storing lists of ancestors of each node. Since several nodes may have same uids (e.g. names of subfolders in separate paths of the tree), the lists of ancestors serve as a second part of the composite key to ensure uniqueness (please see attached). While having lists of ancestors readily available may improve the query performance, it also raises the question of the efficiency of data storage and the complexity of the key.
A possible solution could be:
Ensure that uid column uniquely identifies each node in the table.
Use these uids to capture the tree structure in an external linked "closure" table. The idea is to keep track of every "ancestor-descendant" pair (not just "parent-child") along with the corresponding distance (depth) between them. Assuming that each row is it's own descendant and ancestor at depth 0, this leads to fast rules for data queries and updates without the need to keep lists of references.
For small (shallow) hierarchies the befits of using closure tables could be minimal, but it should be quite noticeable with larger datasets and would help us to future-proof the database schema. Please share your thoughts and comments! @danielballan, @dylanmcreynolds, @Kezzsim
The text was updated successfully, but these errors were encountered:
The closure table looks like very elegant pattern that applies well to our use case. I like that triggers can be used to keep the closure table in sync with the "main" table (in our case, the nodes table) without the application getting directly involved.
I think we can use the id as a unique ID for each node.
The key column, which is sometimes a BlueskyRun <uid> but is in general is any string, is akin to name in the example in that article. The key / name can repeat; we just need a unique constraint on (key, parent). (Currently we have a unique constraint on (key, ancestors).)
I think makes sense to sketch this at a level we can benchmark it against realistic data at a realistic scale.
We had a very interesting conversation with @danielballan yesterday about the structure of the
catalog.db
database, specifically the representation of the hierarchy of nodes.The rows of the
nodes
table store metadata of assets that often (always?) are in a hierarchical relationship with each other (e.g. a tree of folders). Currently, this structure is encoded by storing lists of ancestors of each node. Since several nodes may have same uids (e.g. names of subfolders in separate paths of the tree), the lists of ancestors serve as a second part of the composite key to ensure uniqueness (please see attached). While having lists of ancestors readily available may improve the query performance, it also raises the question of the efficiency of data storage and the complexity of the key.A possible solution could be:
uid
column uniquely identifies each node in the table.For small (shallow) hierarchies the befits of using closure tables could be minimal, but it should be quite noticeable with larger datasets and would help us to future-proof the database schema. Please share your thoughts and comments!
@danielballan, @dylanmcreynolds, @Kezzsim
The text was updated successfully, but these errors were encountered: