Modeling designs page review #451

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

lidiazuin wants to merge 14 commits into neo4j:dev from lidiazuin:modelingdesigns

Contributor

lidiazuin commented Feb 19, 2025

No description provided.

lidiazuin added 2 commits

February 19, 2025 11:10


          Modeling designs page review

40133cd


          Data modeling designs page review

52c76cd

lidiazuin marked this pull request as ready for review

April 7, 2025 13:25

AlexicaWright reviewed

View reviewed changes

Contributor

AlexicaWright left a comment

Lots of comments, but one overall thing is the structure. It is common practice to start with the smallest building blocks and then build up, meaning that I would start with presenting the different structures and then put them together, rather than starting with a graph consisting of multiple structures.
This is important content, thank you for curating it @lidiazuin °

modules/ROOT/pages/data-modeling/modeling-designs.adoc Outdated Show resolved Hide resolved

modules/ROOT/pages/data-modeling/modeling-designs.adoc Outdated

-              However, Neo4j allows you to effortlessly adjust detailed and broad changes across pieces or the entirety of the graph.
-              Whether it is small changes over time or a broad definition that includes a variety of needed information about your entities, the database is able to handle it.
-              It is simply up to the developers and architects to determine the structure of the data model and how to define entities for queries.
+              The example shows how different structures can be combined into one graph, and how different types of questions can be answered by one single graph if you use the correct design, depending on the question.

Contributor

AlexicaWright Apr 7, 2025

This sounds a bit awkward, different types of questions can be answered by a single graph if you use the correct design, depending on the question.

modules/ROOT/pages/data-modeling/modeling-designs.adoc Outdated

    
              One of the earliest decisions you may encounter is whether to model something as a property on a node or as a relationship to a separate node.

              Take, for example, the data below modeling a movie genre as a property on the `Movie` node.

              A monopartite graph structure consists of a single set of nodes with a single label.

              Most algorithms rely on this type of graph and they are very common when using spanning trees, link:{docs-home}/cypher-manual/current/patterns/shortest-paths/[shortest paths], and link:{docs-home}/graph-data-science/current/algorithms/community/[community detection].

Contributor

AlexicaWright Apr 7, 2025

Shortest path can be run on any type of graph, including bipartite and multipartite ones. Maybe use another example, like PageRank or Eigenvector?

modules/ROOT/pages/data-modeling/modeling-designs.adoc Outdated Show resolved Hide resolved

modules/ROOT/pages/data-modeling/modeling-designs.adoc Outdated Show resolved Hide resolved

modules/ROOT/pages/data-modeling/modeling-designs.adoc Outdated Show resolved Hide resolved

modules/ROOT/pages/data-modeling/modeling-designs.adoc Outdated Show resolved Hide resolved

modules/ROOT/pages/data-modeling/modeling-designs.adoc Outdated Show resolved Hide resolved

modules/ROOT/pages/data-modeling/modeling-designs.adoc Outdated Show resolved Hide resolved

modules/ROOT/pages/data-modeling/modeling-designs.adoc Outdated Show resolved Hide resolved

lidiazuin and others added 2 commits

April 8, 2025 12:37


          Apply suggestions from code review

adea775

Co-authored-by: Jessica Wright <49636617+AlexicaWright@users.noreply.github.com>


          Updates after review

b4e4c9e

lidiazuin requested a review from AlexicaWright

April 8, 2025 12:06


          updating image

e2aba40

neo4j deleted a comment from neo-technology-commit-status-publisher

lidiazuin commented

View reviewed changes

modules/ROOT/pages/data-modeling/modeling-designs.adoc Outdated

+              This is an improvement, but the model is still not optimal and should be iterated on.
+              When changes are done to your model, it is important to keep track of them by versioning your model.
+              // Content on versioning is WIP.

Contributor Author

lidiazuin Apr 8, 2025

Please refer to #465

lidiazuin requested a review from nmervaillie

April 8, 2025 13:45

AlexicaWright reviewed

View reviewed changes

Contributor

AlexicaWright left a comment

Well done! I like the new structure! Left some more comments, sorry.. ;)

modules/ROOT/pages/data-modeling/modeling-designs.adoc Outdated Show resolved Hide resolved

modules/ROOT/pages/data-modeling/modeling-designs.adoc Outdated Show resolved Hide resolved

modules/ROOT/pages/data-modeling/modeling-designs.adoc Outdated Show resolved Hide resolved

modules/ROOT/pages/data-modeling/modeling-designs.adoc Outdated Show resolved Hide resolved

modules/ROOT/pages/data-modeling/modeling-designs.adoc

-              If you plan to do analysis on individual items and return only details about that entity (like genres on a particular movie), then the first data model would serve perfectly well for your needs.
-              However, if you need to run analysis to find common ground between entities or look at a group of nodes, then the second data model would definitely improve performance of those types of queries.
+              Instead, you could either turn the `Role` node into a property of the `WORKED_AT` relationship or use an *intermediate node* between the `Person`, `Company`, and `Role` nodes:

Contributor

AlexicaWright May 20, 2025 •

edited

Loading

Given that we talk about hyperedges and not about creating relationships from relationships, I think we should stick to using an example that shows a relationship between more than two nodes.

modules/ROOT/pages/data-modeling/modeling-designs.adoc Outdated Show resolved Hide resolved

modules/ROOT/pages/data-modeling/modeling-designs.adoc Outdated

-              If so, being able to use more than one data model is a great solution!
+              The division and length of the time periods are set based on the context of the question you need to answer.
+              On the top of the timeline tree is the "all time node" that represents the entire timeline.
+              The timeline is then divided into relevant time periods, represented by the nodes below the all time node.

Contributor

AlexicaWright May 20, 2025

I think we're missing an important piece of information here. This (from the KB article):
The rest of the data nodes, which are non-time data nodes, are the nodes that contain the important pieces data in the graph. These nodes link into the timeline tree at the appropriate leaf node.

modules/ROOT/pages/data-modeling/modeling-designs.adoc Outdated Show resolved Hide resolved

modules/ROOT/pages/data-modeling/modeling-designs.adoc Outdated

+              With this xref:data-modeling/tutorial-refactoring.adoc[refactored design], you can look only at the relationships for dates you care about and then scan for the relevant airline through the `airline` property in the `Flight` node.
+              This is an improvement, but the model is still not optimal and should be iterated on.
+              When changes are done to your model, it is important to keep track of them by versioning your model.

Contributor

AlexicaWright May 20, 2025

Why is it important to keep track of versions of your model?

modules/ROOT/pages/data-modeling/modeling-designs.adoc Outdated

+              The example shows how different structures can be combined into one graph, and how different types of questions can be answered by one single graph if the modeling is done efficiently.
+              It also shows the many ways nodes and structures can span out.
+              === Monopartite

Contributor

AlexicaWright May 20, 2025

This feels a little abrupt? Maybe it can be introduced with a little sentence about different graph structures or something?


          Apply suggestions from code review

20bc249

Co-authored-by: Jessica Wright <49636617+AlexicaWright@users.noreply.github.com>

nmervaillie suggested changes

View reviewed changes

Member

nmervaillie left a comment

Very useful content overall. I just see an issue with the time tree, see comments

modules/ROOT/pages/data-modeling/modeling-designs.adoc Show resolved Hide resolved

modules/ROOT/pages/data-modeling/modeling-designs.adoc

-              However, Neo4j allows you to effortlessly adjust detailed and broad changes across pieces or the entirety of the graph.
-              Whether it is small changes over time or a broad definition that includes a variety of needed information about your entities, the database is able to handle it.
-              It is simply up to the developers and architects to determine the structure of the data model and how to define entities for queries.
+              image::hyperedge.svg[An example of a hyperedge in which a relationship is connected to two nodes, a feature not available in Neo4j,width=400,role=popup]

Member

nmervaillie May 20, 2025

I don't get why there is a KNOWS relationship here whereas in the other diagram below there is a IN_ROLE rel.

Contributor Author

lidiazuin May 23, 2025

This should have been fixed yet. Is this the image that you are seeing? https://github.com/neo4j/docs-getting-started/pull/451/files#diff-c7e86415c640b7192119c42108fda32b36bb6cba201e6f5ee9d80b1a63a0b021

modules/ROOT/pages/data-modeling/modeling-designs.adoc Show resolved Hide resolved

modules/ROOT/pages/data-modeling/modeling-designs.adoc Outdated

-              Each section of the class (or offering) would then become an instance of the course.
+              Linked lists are commonly used in computer science and they are particularly useful whenever the sequence of objects matters.
+              In this data structure, a simple-linked list is where each node links to the next node only, whereas in a double-linked list, each node links both to the next and the previous node.
+              Neo4j *does not* support double-linked lists.

Member

nmervaillie May 21, 2025

Technically, nothing prevents from doing this.
However this is redundant info, as one relationship implies the other. They are semantically equivalent.
There is no performance difference traversing in both directions.
I would rephrase at this is not recommended.

modules/ROOT/pages/data-modeling/modeling-designs.adoc

+              === Interleaved linked list
+              There are different ways to sequence a list of items.
+              When nodes are connected in a nonlinear fashion, they are referred to as an interleaved linked list.

Member

nmervaillie May 21, 2025

Not sure I get the question.
I don't see this as linear or not. Both relationship types just represent different information, sharing pieces of content

modules/ROOT/pages/data-modeling/modeling-designs.adoc Outdated

-              The tradeoff is that now you will need to maintain two models.
-              Each time you create a new node or relationship or update pieces of the graph, you will need to make changes to accommodate both models.
-              This can also impact query performance, as you might have double the syntax needed to update each model.
+              == Timeline tree

Member

nmervaillie May 21, 2025

The whole time tree section needs to be removed.
It used to be useful when neo4j did not have indexes on dates. This is not the case anymore. This pattern is obsolete.
Not sure why it ended up in the PS KB in the first place.

modules/ROOT/pages/data-modeling/modeling-designs.adoc Outdated


		If you were, instead, working with a data model that uses time as the navigator instead of as the anchor, this same question would require a great deal of property lookups and a lot of inefficient gather-and-inspect.

		== Time-bound data

Member

nmervaillie May 21, 2025

same as for time tree, this is not true anymore now that date indexes are available

modules/ROOT/pages/data-modeling/modeling-designs.adoc Outdated

+              The example shows how different structures can be combined into one graph, and how different types of questions can be answered by one single graph if the modeling is done efficiently.
+              It also shows the many ways nodes and structures can span out.
+              === Monopartite

Member

nmervaillie May 21, 2025

I'm not sure monopartite / bipartite / multipartite brings value here.
I don't see this as a pattern as people will choose one or the other representation according to the use case. There is no benefit of one over the others. It would see this more as a GDS topic, where a given shape must be used according to the algorithm.

lidiazuin and others added 4 commits

May 23, 2025 15:02


          Merge branch 'dev' into modelingdesigns

066296a


          updates after review

413b62c


          reverting changes from another pr


          reverting changes from another pr

b5cae2a

lidiazuin requested review from AlexicaWright and nmervaillie

June 3, 2025 10:38

AlexicaWright reviewed

View reviewed changes

Contributor

AlexicaWright left a comment

And there, Submit review.. ;)

modules/ROOT/pages/data-modeling/modeling-designs.adoc

-              If you plan to do analysis on individual items and return only details about that entity (like genres on a particular movie), then the first data model would serve perfectly well for your needs.
-              However, if you need to run analysis to find common ground between entities or look at a group of nodes, then the second data model would definitely improve performance of those types of queries.
+              Instead, you could either turn the `Role` node into a property of the `WORKED_AT` relationship or use an *intermediate node* between the `Person`, `Company`, and `Role` nodes:

Contributor

AlexicaWright Jun 5, 2025

In order to illustrate a hyperedge, the same relationship needs to be used to connect one node to two or more nodes.
So you need to make the WORKED_AT and IN_ROLE one and the same relationship and it needs to connect the person node to both the company and the role nodes.

modules/ROOT/pages/data-modeling/modeling-designs.adoc Outdated

    
              .Graph model of movies and their genres

              image::modeling_genre_node-arr.svg[role="popup-link",400,400]

              The use of intermediary nodes can also answer the question "Who worked at the same company at the same time?" as the added employment event contains information about when each individual worked at a certain company.

              A `MATCH` clause would show that Patrick and David both worked at Acme, being colleagues from 2004 to 2005 since their employment events overlap during that time.

Contributor

AlexicaWright Jun 5, 2025

Maybe we should have an example of this MATCH clause?

modules/ROOT/pages/data-modeling/modeling-designs.adoc Outdated

-              It is much simpler than our earlier version because it uses a natural, graph pattern (entity-relationship-entity) to find the information needed.
-              First, Cypher finds a movie and the genre it is related to, then looks for a second movie that is in that same genre.
+              Intemediate nodes can also add value to a model by providing a way to share data and thus reduce duplicate information.
+              In this example, Sarah sends an email to Lucy and copies David and Claire to it:

Contributor

AlexicaWright Jun 5, 2025

Maybe we could mention that the content of the email is a property on every relationship?

modules/ROOT/pages/data-modeling/modeling-designs.adoc Outdated

-                    (m2:Movie)-[:IN_GENRE]->(g)
-              RETURN m1, m2, g
-              ----
+              By using a fan-out, duplication can be reduced as a property (`content`) is broken out and made into its own node (`Email` with the property `content`) instead of being repeated, in this case, on every relationship:

Contributor

AlexicaWright Jun 5, 2025

Suggested change

      
            By using a fan-out, duplication can be reduced as a property (`content`) is broken out and made into its own node (`Email` with the property `content`) instead of being repeated, in this case, on every relationship:
          
            If you instead fan out the the model, you reduce duplication by breaking out the property `content` from all relationships and turning it into the intermediary node `Email` instead.

modules/ROOT/pages/data-modeling/modeling-designs.adoc Outdated

    
              If you plan to do analysis on individual items and return only details about that entity (like genres on a particular movie), then the first data model would serve perfectly well for your needs.

              However, if you need to run analysis to find common ground between entities or look at a group of nodes, then the second data model would definitely improve performance of those types of queries.

              Once the property value `content` is moved to a single node `Email`, it can be referenced via relationships with the `User` nodes that previously held that value.

              Now there are no copies or duplications.

Contributor

AlexicaWright Jun 5, 2025

Suggested change

      
            Now there are no copies or duplications.
          
            Now there are no duplications.

modules/ROOT/pages/data-modeling/modeling-designs.adoc Outdated Show resolved Hide resolved

modules/ROOT/pages/data-modeling/modeling-designs.adoc

-              Inspecting the several properties of each `Flight` node could be expensive on resources.
+              * The order in which the episodes were aired using the `NEXT` relationship and through a simple-linked list.
+              * The order in which the episodes were produced using the `NEXT_IN_PRODUCTION` relationship, which creates an interleaved linked list.
+              It is not a linear list, as it goes 1, 3, 2, 5, 4.

Contributor

AlexicaWright Jun 5, 2025

It is still linear since it goes from one node to the next in a linear fashion. That the sequence is not in numerical order is not the same as non-linear.

modules/ROOT/pages/data-modeling/modeling-designs.adoc Outdated Show resolved Hide resolved

modules/ROOT/pages/data-modeling/modeling-designs.adoc Outdated Show resolved Hide resolved

modules/ROOT/pages/data-modeling/index.adoc Outdated

               Your initial graph data model is only a starting point.
               As you learn more about your use cases or if they change, the model needs to adapt.
-              Additionally, you may find that, especially when the graph scales, you need to xref:data-modeling/graph-model-refactoring.adoc[refactor] your model to ensure it is aligned with your business needs as they evolve.
+              Additionally, you may find that, especially when the graph scales, you need to xref:data-modeling/tutorial-refactoring.adoc[refactor] your model to ensure it is aligned with your business needs as they evolve.

Contributor

AlexicaWright Jun 5, 2025

Why is there a section here called How to create a graph data model when that has a whole own page?

Contributor Author

lidiazuin Jun 9, 2025

I reformulated this section to avoid the redundancy

lidiazuin and others added 4 commits

June 5, 2025 16:21


          updates after review

9a92f39


          Apply suggestions from code review

6c6b043

Co-authored-by: Jessica Wright <49636617+AlexicaWright@users.noreply.github.com>


          Removing redundant information from index

7cc6dbb


          Merge branch 'modelingdesigns' of github.com:lidiazuin/docs-getting-s…

fbc833e

…tarted into modelingdesigns

Collaborator

neo4j-docops-agent commented Jun 9, 2025

This PR includes documentation updates
View the updated docs at https://neo4j-docs-getting-started-451.surge.sh

Updated pages:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet