Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configurable pattern matching semantics in response to #174 #175

Open
wants to merge 38 commits into
base: master
Choose a base branch
from

Conversation

boggle
Copy link
Contributor

@boggle boggle commented Jan 25, 2017

Copy link
Member

@Mats-SX Mats-SX left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the direction of this.

The default uniqueness mode used by `MATCH` (without a further specification of the preferred uniqueness mode) is relationship-unique matching.

`MATCH ALL` does not reject any paths - not even paths containing cycles - and hence can lead to infinite result sets for the whole query.
It is recommended that implementations generate at least a warning when static analysis is not able to proof query termination due to the chosen uniqueness mode.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

proof -> prove


=== Proposal: Default uniqueness mode

Additionally, it is proposed that a conforming implementation should provide a pre-parser option for defining a default uniqueness level for use with regular pattern matching.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced this kind of recommendation belongs in a CIP. Is it not well understood that an implementing system would provide ways of changing defaults?

* `closed(p)`: true if the start and the end node of `p` are the same node
* `trail(p)`: true if `p` contains no duplicate relationships
* `simple(p)`: true if `p` contains no duplicate relationships and either no duplicate nodes at all or the start node and the end node are the same node
* `trek(p)`: true if `p` contains two identical consecutive relationships
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does identical mean here? Same rel-type? Same type and properties? Equal?

* `trail(p)`: true if `p` contains no duplicate relationships
* `simple(p)`: true if `p` contains no duplicate relationships and either no duplicate nodes at all or the start node and the end node are the same node
* `trek(p)`: true if `p` contains two identical consecutive relationships
* `repetetive(p)`: true if `p` contains any closed subpath `q` of `size > 1` that is immediately repeated after itself in `p`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

repetitive

RETURN p
----

Note that these functions naturally extend to lists.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean lists generally, or lists containing only nodes and relationships? I'm not sure I follow; what does trail(list) yielding true mean? That the list is a trail?

Changing the uniqueness mode of a sub query recursively changes the default uniqueness mode for all contained `MATCH` clauses unless it is overridden again. Examples:

* `MATCH <uniqueness-modes> { MATCH ... } ...`
* `DO <uniqueness-modes> { MATCH ... } ...`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are MATCH and DO (this is the first time it appears on this repo I think) the two cases where you'd be able to supply these modes? What about MERGE?

@boggle boggle changed the title Add proposal for isomorphic pattern matching in response to #174 Configurable pattern matching semantics in response to #174 Jul 17, 2017
== Motivation

Currently Cypher uses pattern matching semantics that treats all patterns that occur in a `MATCH` clause as a unit (called a *uniqueness scope*) and only considers pattern instances that bind different relationships to each fixed length relationship pattern variable and to each element of a variable length relationship pattern variable.
This has come to be called *cypermorphism* informally and is a variation of edge isomorphism.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought these two were synonymous; what is the variation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'Academic' edge isomorphism only talks about a single, connected candidate walk while cyphermorphism considers all relationships bound by any pattern in the same match (even relationships bound by different, disconnected walks) for uniqueness.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha! Thanks for the clarification.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that this is a difference of "morphism". If one followed strict isomorphism ("path isomorphism" in Walks, Trails, Paths terms, no repeated vertices, and therefore also no repeated edges), then Cypher's current "pattern gluing" rules would apply (unless we change those rules), and we would end up evaluating matches against the compound, glued pattern, but using isomorphic semantics. Gluing may be syntactic salt, but is orthogonal to "morphism". Cyphermorphism, in my view, is no different to "Trail morphism", or "edge isomorphism".

* Now in Appendix
* "bind" -> "class" (former deprecated)
* Added example
@romanskas
Copy link

Just for the completeness: there is a fourth option (injective vertices, non-injective edges): (a)-[e1]->(b), (a)-[e2]->(b). In this case, a and b have to be distinct, but e1 and e2 can match to the same edge.

| 'DIFFERENT', ('RELATIONSHIPS' | 'EDGES'), [ VariableList ]
PatternMorphism = 'DIFFERENT', ('NODES' | 'VERTICES')
| 'DIFFERENT', ('RELATIONSHIPS' | 'EDGES')
| 'DIFFERENT', [ VariableList ]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional VariableList? Is that really right?

As we can see above, patterns in Cypher consist of a comma-separated list of _pattern parts_, where a pattern part is exemplified by `p = (e:Employee)-[:REPORTS_TO*1..3]->(m:Manager)`.
PathClass = 'WALK'
| 'TRAIL'
| [ 'SIMPLE' ], 'PATH'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does TRAIL, PATH and SIMPLE PATH really encode three different classes? If not I wonder why synonyms are allowed. (a WALK is obviously different from those three)

@boggle boggle closed this May 7, 2018
@boggle boggle deleted the isomatch branch May 7, 2018 11:41
@boggle boggle restored the isomatch branch May 7, 2018 11:45
@petraselmer petraselmer reopened this May 31, 2018
@boggle
Copy link
Contributor Author

boggle commented May 31, 2018

Note that this CIP is in a heavy state of flux in order to allow for alignment with ongoing discussions.

@thobe
Copy link
Contributor

thobe commented Dec 20, 2018

Aims to solve #174

@felix-boeseler
Copy link

Hello, are there any updates regarding this CIP? I am very interested in the proposed DIFFERENT operator functionality so I can use node isomorphism in contrast to the edge isomorphism.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants