Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Property Graphs #45

Open
draggett opened this issue Dec 14, 2018 · 61 comments
Open

Property Graphs #45

draggett opened this issue Dec 14, 2018 · 61 comments
Labels
Category: language features For language features of RDF itself -- model and syntax higher-level Higher-level RDF should address this standards Standardization should address this

Comments

@draggett
Copy link
Member

draggett commented Dec 14, 2018

This deserves an issue to itself given the growing popularity of property graph databases and the opportunity for using RDF as an interchange framework between different databases. See also #20 Standardized n-ary relations (and property graphs) and #22 Language-tagged strings.

Property Graphs are a kind of graphs consisting of nodes and links between them where nodes and links may be associated with a set of property-value pairs, where the values may themselves be sets of property-values and so forth recursively. The link predicate or label can itself be treated as a kind of property.

It is possible to represent property graphs with reification, but that adds considerable complexity. We can easily annotate a node using a link to another node. However, we also need a way to link from a link or to a link. One approach is for each link to expose an identifier enabling the link to be treated as equivalent to an RDF blank node. Such identifiers are okay for links within the same graph and can be implicit in serialisations like Turtle* where a pair of curly braces implies a new identifier.

What if you want to make a link something that can be referenced stably from other graphs? That suggests the need for a means to associate the link with a named anchor that is unique within the graph. What if the link itself starts in one graph and ends in another - where would you situate the anchor for that link? The answer would seem to be the graph that the link was defined in.

Another challenge concerns the case where a node stands for another graph, e.g. the node has a URI that can be dereferenced to obtain the graph the node stands for. This allows you to make statements about a graph as a whole rather than one of its nodes or links. It would be desirable to quickly determine that a node indeed stands for a graph so as to avoid having to find this out by trying to deference the node.

Yet another challenge is where you want to distinguish properties from other kinds of links. This would allow for visualisations where you can hide and reveal properties with a tabular presentation of property-value sets. See #37 Lack of RDF Visualisation Software.

It would be desirable to have short names for links so that paths through a graph can be expressed simply via a dotted path string, analogous to properties in object oriented programming languages. Such short names could be scoped to the node that acts as the subject for a link, or the root for a n-ary chunk.

@dbooth-boston dbooth-boston added the Category: language features For language features of RDF itself -- model and syntax label Dec 14, 2018
@dbooth-boston
Copy link
Collaborator

I also think support for property graphs is very important. However, my strong hope is that we adopt a mechanism for n-ary relations that subsumes property graphs as a special case, so that we do not need a separate mechanism. So far I have not seen any big barriers to such an approach.

My 2 cents on some of your questions:

It is possible to represent property graphs with reification, but that adds considerable complexity.

Agreed. And I find myself recoiling in horror at the mere mention of reification. In my view, RDF reification should be deprecated, since named graphs are generally much better, though not needed for property graphs.

What if you want to make a link something that can be referenced stably from other graphs?

Then a URI should be used, consistent with existing RDF practice.

What if the link itself starts in one graph and ends in another - where would you situate the anchor for that link?

Although that could be done in existing TriG (for example) I do not think it should be supported in a new higher-level RDF language. I think an RDF molecule that represents an n-ary relation should exist entirely in each graph where it is used, and should be considered malformed if one tries to put part of it in one graph and part in another. The reason is that the user, by creating it as an n-ary relation, intended it to be treated as a single unit. However, there would be nothing wrong with asserting some new triples or a new n-ary relation that makes use of some of the constituents of another n-ary relation.

Another challenge concerns the case where a node stands for another graph, e.g. the node has a URI that can be dereferenced to obtain the graph the node stands for. This allows you to make statements about a graph as a whole rather than one of its nodes or links. It would be desirable to quickly determine that a node indeed stands for a graph so as to avoid having to find this out by trying to deference the node.

My gut feeling is that that should be done by attaching additional metadata triples to the graph URI, such as provenance.

Yet another challenge is where you want to distinguish properties from other kinds of links.

Yes. My assumption is that by coming up with a standard way to define n-ary relations, this ability will fall out as a natural consequence: a particular group of triples will be automatically identifiable as an n-ary relation comprised of those properties.

It would be desirable to have short names for links so that paths through a graph can be expressed simply via a dotted path string, analogous to properties in object oriented programming languages. Such short names could be scoped to the node that acts as the subject for a link, or the root for a n-ary chunk.

Interesting idea! I wonder how the scope could be known, so that the interpretation would be stable in the face of changing data. It would be bad if x.foo were to select one property against one set of data, but a different property if more data were added. Anyone have thoughts on how this could be done?

@draggett
Copy link
Member Author

draggett commented Dec 15, 2018

It would be desirable to have short names for links so that paths through a graph can be expressed simply via a dotted path string, analogous to properties in object oriented programming languages. Such short names could be scoped to the node that acts as the subject for a link, or the root for a n-ary chunk.

Interesting idea! I wonder how the scope could be known, so that the interpretation would be stable in the face of changing data. It would be bad if x.foo were to select one property against one set of data, but a different property if more data were added. Anyone have thoughts on how this could be done?

I think that is tied to the cardinality of the property, i.e. whether "foo" is constrained to a singular value or can have multiple values (via multiple links with the same subject and predicate). Following the given path may thus return a set of nodes containing zero, one or multiple nodes. When we look at how to model n-ary chunks, we should also look at associated metadata including cardinality constraints, composite keys and so forth. What metadata would make data and rules easier to use by the vast majority of developers?

Path following is related to regular expressions and RDF shapes, as well as to XPath for XML. I've explored it in some experiments inspired by ATNs, see https://www.w3.org/WoT/demos/shrl/test.html

p.s. I am using the term chunk as it is popular in Cognitive Science and features prominently in cognitive architectures like CMU's ACT-R.

@dbooth-boston
Copy link
Collaborator

If someone wrote x.foo as a path, using short names, then I assume that each corresponding long name would be comprised of a namespace plus the short name. How would the system know which namespace to prepend to the short name? For example, if the current namespaces included both http://example/a# and http://example/b#, how would the system know whether foo should be expanded to http://example/a#foo or http://example/b#foo? Or do you envision this working some other way?

@draggett
Copy link
Member Author

draggett commented Dec 17, 2018

I assume that each corresponding long name would be comprised of a namespace plus the short name

No, that isn't the case. This is just a graph of objects where the object properties act as links to other objects, and each object property has a name that is scoped to that object. In RDF terms, the subject node + the property name provides a map to a predicate, and uniquely identifies a set of triples with that subject and predicate.

A restriction on this would be to constrain property names to uniquely identify predicates in this graph. This is tantamount to saying that the property name uniquely identifies the meaning of a property, rather than this being something specific to each object.

That is an overly strong constraint as in the real world, words are often used for different meanings depending on the context. However, there is nothing to prevent implementations from optimising how they handle this internally.

@dbooth-boston
Copy link
Collaborator

I would like to pursue the possibility of encoding property graphs in standard RDF. Have others already done this? If so, what RDF patterns were used, and what limitations did they have?

@draggett
Copy link
Member Author

Apart from reification, one approach that has been mentioned is to use a named graph that contains just the triple you want to annotate. This generalises to annotations on multiple triples, but I am unsure how you indicate that a given triple is in multiple named graphs. Another challenge is how you identify a graph when there isn't an explicit name for it, e.g. when using curly braces in Turtle* around the triples you want to annotate, this would imply an implicit blank node for the associated graph.

This makes me think about how to deal with graphs from an implementation perspective. One idea is to express the relationship between a triple and a graph is as a property of the triple, where the property can have multiple values. Another idea is to allow for relationships between graphs, e.g. for one graph to be subsumed as part of another graph. A database could create its internal identifiers, and associate them with external identifiers when those are defined.

I wonder how this is dealt with by existing property graph database solutions?

@VladimirAlexiev
Copy link

I am unsure how you indicate that a given triple is in multiple named graphs.

You make several quads having the same <s,p,o>.

@dbooth-boston dbooth-boston added standards Standardization should address this higher-level Higher-level RDF should address this and removed standards Standardization should address this labels Mar 11, 2019
@amirouche
Copy link

Property Graphs are a kind of graphs consisting of nodes and links between them where nodes and links may be associated with a set of property-value pairs, where the values may themselves be sets of property-values and so forth recursively.

The part in bold is not true. Node and Link (respectively Vertex and Edges) properties are plain old hashmap, JSObject or dict.

The link predicate or label can itself be treated as a kind of property.

Yes.

It is possible to represent property graphs with reification, but that adds considerable complexity.

What reification? I looked up around I still don't understand.

What if the link itself starts in one graph and ends in another - where would you situate the anchor for that link?

That is exactly what I meant about "it is advanced use" in the this comment.

Another challenge concerns the case where a node stands for another graph, e.g. the node has a URI that can be dereferenced to obtain the graph the node stands for.

I think we should come up with a representation of a property graph before trying to generalise to recursive or hierarchical graph or "meta-graph".

True story: as part of a foolish tentative to replace the atomspace, I was thinking about how to implement this kind of things. Basically a single entity called the atom that has outgoing and incoming links and properties as a hashmap. Then came up the idea of "recursive hyper graph". Like you wrote, it is complex to just to imagine a node (or atom in my case) pointing outside its own graph. Like you wrote, having a node represent another graph or sub-graph (because it is hierarchical, it make sens). Again, I think it is the role of the reasoner / rule engine to deal with that kind of complexity. As part of my exploration, I tried to implement but in the end, there really no way to "make it fast" and a priori, you don't know when the "query" will end.

my strong hope is that we adopt a mechanism for n-ary relations that subsumes property graphs as a special case

what is "n-ary relations" please?

It would be desirable to have short names for links so that paths through a graph can be expressed simply via a dotted path string, analogous to properties in object oriented programming languages.

That is what Gremlink (from Thinkerpop) mostly does. It is written like

graph.vertices.filter(lambda x: x.type == 'actor').outgoing.filter(lambda x: s.genre 'science-fiction')

@amirouche
Copy link

I have written a some time ago an article on how to build a graph database on top of EAV. You can find it at https://hyper.dev/blog/diy-graph-database-in-python.html.

EAV is somewhat like a triplestore but you can not have multiple triples with the same subject, predicate. On top that abstraction, I built a document store by grouping by subject. Each document has a private field that allows to distinguish node from edges. Also edges have two other private predicates node-start and node-end.

@mhedenus
Copy link

mhedenus commented Apr 20, 2021

Hello, I'd like to pick up this topic and discuss a specific question: How can you distinguish a property from a relation ?

In RDF that is not possible, because there is no such distinction. Example:

<Alice> <knows> <Bob> .

and

<Alice> <mbox> <mailto:alice@example.com> .

are completely equal in the sense that they are simple statements. But the meaning is very different because Alice and Bob are persons, they are entities i.e. they are things (resources) which have distinct existence.
The second statement states that Alice has a property, i.e. contact address. Although the mailbox is an URI it has the flavor of a value, like

<Alice> <name> "Alice".

Of course, you can say that a mailbox is also an entity but wether or not something is an entity is a decision made by the domain model. I think this is the crucial question when you want to bring Property Graph and RDF together!

I also want point out, that Property Graph is a technical way to do ER modelling.
Nodes become entities, edges become relations a key-value pairs become attributes==properties.

My vision is to create a unified graph model the embraces ER-modelling and RDF at once.

@namedgraph
Copy link

In RDF you distinguish between URI resources as objects or datatype resources as objects.
In this case <mailto:alice@example.com> is a URI resource which can have its own triples. "alice@example.com" would be a literal.

Absolute terms like "not possible" do not help IMO because while it may seem so to you coming from a different background, there are very good reasons why RDF is like it is, and formal theory behind them. RDF was designed for data interchange.

Are you familiar with RDF-star?

@mhedenus
Copy link

mhedenus commented Apr 20, 2021

Sure, I know all that, and I am familiar with RDF*.
I completely understand why RDF is designed like it is. The question is not so technical, it is more a theoretical.

Of course you can write

<Alice> <mbox> "alice@example.com"^^xsd:anyURI

We can agree that a literal is property value. But it can be more difficult than that:

<Product1> <price> [ <value> 300 ; <currency> <euro> ]

What about <euro> ? Can it be an entity? If so, then <currency> would be a relation, because a relation is between entities not properties and entities. But is the blank also an entity? Then the whole price would be an entity.

@namedgraph
Copy link

Entity is not an RDF term. If we're talking ontological modeling, a related term would be class.

Sure you can call the price entity, and the euro as well. Why is that a problem?

@dbooth-boston
Copy link
Collaborator

After re-reading some of this thread, I notice that I missed a couple of questions from @amirouche a couple years ago. Sorry!

What reification? I looked up around I still don't understand.

See this brief explanation and this answer on stackoverflow.

what is "n-ary relations" please?

See Defining N-ary Relations on the Semantic Web.

And addressing newer comments from @mhedenus :

How can you distinguish a property from a relation ?

Can you please first explain what distinction you are trying to make between a "property" and a "relation"? AFAIK we do not have widely accepted standard definitions of those terms that clearly distinguish between them. If you could explain what distinction you are trying to make, it would be helpful.

Also, please explain what you mean by "entity", and why you think some things should be considered entities and some should not. When you wrote "they are entities i.e. they are things (resources) which have distinct existence" it sounds like you are using the term "entity" to mean what RDF calls a "resource". But then when you suggest that some things should be considered entities and some should not, that sounds different than the RDF notion of "resource", so I am confused. Can you explain what you mean by "entity" and how it is different from what RDF calls a "resource"?

@mhedenus
Copy link

mhedenus commented Apr 20, 2021

Maybe I should clairfy what it is all about. I am a advocate of RDF since I learned about it 20 years ago (I do programming since 1988 and Java development since 2000). I worked very hard to establish RDF as technology in my company in the automotive industry. Currently, we use RDF primarily for data integration.

But can you use RDF for modelling, e.g. using RDFS or OWL ? I think not.

The reality is: modelling is hard especially because domain experts are normally not software developers. When you start talking about URIs, resources and stuff they only understand blah blah blah.
I believe the main reason why the adoption of RDF is so scarce (YES IT IS! There are still too many peaple out there who have never heared about it) is because people don't get it! RDF is extremely academic!

What people understand (even mechanical engineers) is ER modelling. They understand that there are things (entities or objects) which have properties (or attributes) and they have relations to other things.

Let's make a (over-)simplification here: there are two main graph modelling worlds:

  1. Property Graph == ER Modelling == Domain Modelling == {entities, relations, properties)
  2. RDF == {resources, predicates, literals}

Can these worlds be brought together? Yes. We have developed a graph model that is a Property Graph compatible with RDF.
It is working and I believe that it can give benefits to the RDF world.

@mhedenus
Copy link

mhedenus commented Apr 20, 2021

RDF is talking about resources. Everthing that can be identified with an URI is a resource.
An email-address is a resource. If you use an email-address to denote a person you write (like proposed by FOAF)

<mailto:alice@example.org> a <Person> ; <name> "Alice"

So far so good. Now let's express the fact that "Alice has a email address":

<mailto:alice@example.org> <mbox> <mailto:alice@example.org>

That is legal in RDF and it makes complete sense in RDF. But these statements have different meanings which are only obvious to human readers. In the first statement the mailto URI is an identifier for something we call Alice, in the second statement the same URI is a value that belongs to a property owned by Alice.

Do you agree?

@dbooth-boston
Copy link
Collaborator

dbooth-boston commented Apr 20, 2021

You have used the same URI, <mailto:alice@example.org>, to denote both a person and a mailbox. That is a URI collision. According to the Web Architecture, you should have used different URIs, to avoid the problem that you're raising. RDF itself does not stop you from doing that, but that doesn't mean it's a good practice either.

But what does this have to do with implementing property graphs in RDF? I don't understand where you're going with this example.

@mhedenus
Copy link

mhedenus commented Apr 20, 2021

Yes, this URI collision is not nice, and should be avoided. This example should demonstrate what I think to be the stumble when you try to implement property graph in RDF. When you try to map RDF to property graph you have to know wether the statement's object is another node (and therefore the predicate a relation) or a property value (and therefore the predicate a property type or key).

To say all URIs are mapped to nodes in the property graph and ONLY statements with literals are properties would be an artifical restriction.

To solve this some addtional information is required that tells you which predicates are considered to be relations and which predicates are considered to be properties.

@namedgraph
Copy link

namedgraph commented Apr 20, 2021

I still don't get why developer's unfamiliarity with a technology is being framed as defficiency of the technology, and not the developer. This seems to be a constant theme for EasierRDF.

Many more developers know Javascript than C++. Does that make C++ academic, and by that somehow defficient? Should we have EasierC++?

If developers are familiar with ER or UML or whatever, then provide mappings/converters to OWL/RDF(S). But don't use that as an opportunity to knock RDF.

@dbooth-boston
Copy link
Collaborator

@namedgraph I think I disagree with you fairly fundamentally about this. I think lack of uptake can be an important indicator that a technology is too hard to use. It certainly is not an absolute determinant though.

If you look at market shares, RDF databases are getting clobbered by property graph databases. You can claim that RDF does more than what Property Graphs can do -- and I agree -- but it isn't a huge difference, and apparently it isn't a difference that matters to many common use cases.

I want to improve RDF, not knock it. And that means being honest about its strengths and weaknesses. IMO its biggest weakness is its difficulty of use. If we can make it as easy to use as Property Graphs -- at least for use cases that do not need functionality beyond Property Graphs -- then I think that would be very beneficial for RDF. But as I said before "my strong hope is that we adopt a mechanism for n-ary relations that subsumes property graphs as a special case, so that we do not need a separate mechanism".

@namedgraph
Copy link

@dbooth-boston we've been over this...

I'd like you to try the C++ analogy though. StackOverflow is full of questions "why is C++ so hard?" and yet some of the most critical software is written in it. How is this different from RDF?

@dbooth-boston
Copy link
Collaborator

This is a bit off topic, but I'll indulge your C++ analogy and try to answer. I think you are suggesting that, even though RDF is hard, it is still the right tool for the job sometimes, just as C++ is the still right tool for the job sometimes, even though it is hard. I definitely agree that RDF is sometimes the right tool for the job. (I would not have been involved with RDF for so many years if I didn't!)

But here is where I think the analogy breaks down. When C++ is chosen, almost invariably the overriding reason is for performance. I don't believe anybody would choose C++ over Python (for example), if performance were not a key consideration. And the reason C++ is hard is because it is both a low-level C-compatible programming language and a high-level object-oriented programming language. When performance is critical, there is no getting around the need for a low-level language like C. One could of course use C instead of C++, but the higher-level features of C++ allow for more programmer productivity while still giving access to the low-level features of C. In other words, programmers put up with C++'s difficulty because they NEED have the low-level features that it provides.

In contrast, I do not believe that RDF is chosen because developers really NEED the low-level features that it provides. I believe we can produce a higher-level successor to RDF, that retains the power that we need, while making it easier to use.

As a case in point, I do not believe that we really NEED explicit blank nodes in RDF, i.e., blank nodes like _:b42 that cannot be represented by square brackets [] in Turtle. We could solve the same use cases if RDF did not have them, even though we might have to create a few Skolem URIs instead sometimes. Yet that one little feature -- the ability to write an explicit blank node -- places a disproportionate complexity burden on RDF users. Not only does that feature cause endless confusion to new RDF users (because blank node labels are not stable identifiers), but it is precisely the reason why, after over 20 years, we still do not have a standard way to canonicalize RDF!

In short, the low-level features of C++ are essential to its users, but the low-level features of RDF are not essential. They only continue to exist because we have not yet developed a higher-level, easier-to-use successor.

Unless we succeed in making RDF considerably easier to use, I think RDF will eventually get squeezed out of the picture entirely, in favor of other graph approaches that are easier to use, even though those other graph approaches are not quite as powerful.

@mhedenus
Copy link

But are mixing up different things now...

@namedgraph
Copy link

@mhedenus Entity–relationship model - Limitations:

ER models are readily used to represent relational database structures (after Codd and Date) but not so often to represent other kinds of data structure (data warehouses, document stores etc.)

@mhedenus
Copy link

@namedgraph : this is a another more philosophical discussion. The inventor of the ER model Chen regarded it as the fundamental model of everything (and I agree).

We use RDF for integrating data from very different datasources. To do so each datasource must provide its data as RDF.
To do so the data must be transformed.
To do so you must do a model-model-mapping, i.e datasource's domain model to RDF.

Here is the crucial point, because that is not working well for very different reasons.
It is way simpler and practical to map the datasource's domain model to ER and from there to RDF.

@draggett
Copy link
Member Author

@mhedenus wrote:

This "blank-node-cluster" is like the "chunk" mentioned by @draggett ?

Yes.

RDF URIs are akin to words in English and other natural languages, in that they are important for semantic interoperability between communicating agents. Internally, agents need to be able to create IDs for chunks generated on the fly. In the human brain, chunks correspond to semantic pointers in noisy high dimensional spaces (the concurrent firing patterns across cortical columns). These are unique to each person. We are able to communicate because we have a shared understanding of concepts and their interrelationships, and are able to map semantic pointers to words.

RDF's blank nodes are equivalent to internal identifiers, which are clearly needed, but whose meaning is implicit in the graph structure.

@afs
Copy link
Contributor

afs commented Apr 21, 2021

Another direction is "shapes" which describe the structure of the data. These are descriptions - they can be used for validation or they can be taken as definitions.

Both SHACL Compact Syntax and ShEx Compact Syntax provide a modelling view where relationship and attributes are more clearly identified.

Another aspect is that there is a tools role here to provide the view - not purely data format issue.

@amirouche
Copy link

@mhedenus

RDF can implement property graphs, what RDF allows to describe is a superset of what a property graph allows to describe.

Well, I think that is currently not completely true, because of the issue I layed out: RDF does not provide the fundamental distinction of relation and property of the common modelling schemes.

ref: #45 (comment)

What I wrote is whether one can do everything that GraphDB does with RDF: the answer is yes, it can even do more. I did not write how. If we consider GraphDB and Property Graph two differents things: a GraphDB is software, a Property Graph is concept. You can not with a GraphDB query by key-value pairs, such as: give me THE vertex with the uid=42. Unlike with my implementation of property graphs on top of RDF.

The solution we implemented is simple: when reading RDF you must specify a model that tells you how to interpret the predicates. The question remains what to do with predicates that are not specified in the model.

In my system there is no difference at the RDF level between items of SPO, they can all take the same types of objects, it is up the user to choose the schema even at that level.

The rules are:

* if the subject is a URI and the object is a URI the predicate is a relation unless specified otherwise

* if the object is a blank node the predicate is a property and the object is a complex property value (a structure)

* if the subject is a blank node the predicate is a (sub-)property of the complex property value (a structure)

I do not understand the last point, what is a sub-property?

Here is my approach:

  • Subject are always blank nodes, I use uuid4 as blank nodes.
  • There is three reserved predicate symbols type, start and end
  • Given a subject, start and stop can only be associated with a subject where there is triple (subect, type, "edge")
  • Any other predicates are properties, where objects are any acceptable values.

Now you have a clean mapping to a ER/class model: the URIs which are subjects belong to entities (domain model objects), predicates between entity-URIs are always relations and there is no relation from within a complex property that points to another entity. That means in terms of ER modelling:

That is misunderstanding RDF to say "there is a clean ER/class model representation [in my upper layer on top of RDF]". RDF is built out-of relations, the basic nature is a network between entities where a link is directed and has a label. There is many ways to add properties to Bob or Aziz, or even add relation to Bob knows Aziz, unlike in a GraphDB, you can not relate to an edge, reifying the edge into a vertex and edges, hence introduce the metatype of hyperedge.

There is only a relationship between Alice and Bob but not between any part of them.

In you approach yes, they may be only one edge between two vertex, unlike my approach.

One interesting thing is that you can decide post facto wether to interpret a predicate as relation or property.

See above for an alternative.

@mhedenus
Copy link

mhedenus commented Apr 26, 2021

@amirouche thank you for the reply. I haven't completly understood everything you've written but I have a feeling that we are getting closer.

First clairification of what I meant: Consider a RDF graph. Now you also have a (legacy) application that wants to import the data. Let's assume you have a Java application with a class domain model. Then you must do a model-model-transformation.
If you're lucky there is a UML description of the application's model. You must match some sub-graph of the RDF graph to Java objects.

This mapping process includes a specific interpretation of the RDF data: some predicates are interpreted to be members of a the objects, some predicates are interpreted to be associations between objects (by the way: languages like Java and C++ also do not distinguish between association and membership, but this is another story).

I have been working on how to standardize these mapping process so that every (legacy) application can import/export data to from/to RDF. This is a very important practical thing. I call RDF "academic" because it seems to me that the RDF/Semantic Web world somehow ignores the reality that RDF must interact with existing applications (and please don't give me the RDFa story!) in way that the common developer can use it.

Again a picture. Objects or entity instances are identified by URI nodes. Some predicates therefore become associations/relations and other become properties/members. For properties you can draw an analogy to XML Schema. There are simple properties == properties that are coded as simple strings. I think we all agree that these are just RDF literals. There are also complex properties == structures like lists or maps. They are considered to be sub-graphs "starting" with a blank node. A restriction here is that loops in the complex-property-sub-graphs are forbidden and they must be trees. I call literals or other blank node attached to blank nodes "sub-properties". But the whole "blank-literal-sub-graph" is considered to be a member of the object.
URIs showing-up in the complex-property-subgraph are also interpreted as properties not as relations.

image

@mhedenus
Copy link

@amirouche

Subject are always blank nodes, I use uuid4 as blank nodes.
There is three reserved predicate symbols type, start and end
Given a subject, start and stop can only be associated with a subject where there is triple (subect, type, "edge")
Any other predicates are properties, where objects are any acceptable values.

The scheme you describe here seems to be a higher level of modelling, i.e. is this meant to be a graph "meta-model" ?

@mhedenus
Copy link

mhedenus commented Apr 26, 2021

To narrow it down (sorry for bothering you but it is basically a very simple question) another example. Here is a domain model, the (naive) implemenation and RDF graph data.

Yes, you can say: why not annotating the the Java code similar to JaxB (e.g. using RDFBeans)?? Because this implies existing knowledge about the data and it does not answer the question: how do you see in the RDF graph what is supposed to be a property and what a relation ?

image

@draggett
Copy link
Member Author

You may think of name, firstName and lastName as properties, but you could equally think of them as predicates. It doesn't make any real difference, and there's nothing to stop you classifying predicates as properties or links in an ontology.

@amirouche
Copy link

The scheme you describe here seems to be a higher level of modelling, i.e. is this meant to be a graph "meta-model" ?

Yes.

I haven't completely understood everything you've written but I have a feeling that we are getting closer.

Thanks for the feedback. I understand better the problem with the following:

Consider a RDF graph. Now you also have a (legacy) application that wants to import the data. Let's assume you have a Java application with a class domain model. Then you must do a model-model-transformation.
If you're lucky there is a UML description of the application's model. You must match some sub-graph of the RDF graph to Java objects.

And the following:

I have been working on how to standardize these mapping process so that every (legacy) application can import/export data to from/to RDF.

Sort of an Object-Relational-Mapper (ORM) where instead of an SQL database, there is a RDF database. In other words, Map RDF concepts to Java concepts. In an ORM such as Hibernate, as of 2010, a Java class will describe a table where columns are described with annotations (IIRC), then the a row of the table will be represented as an object instance of that class, getters and setters to access column values. IIRC, SQL is also built with Java code using method chaining. FWIW, most of my experience is with Python ORMs, and I also built a Object-Graph-Mapper.

I call RDF "academic" because it seems to me that the RDF/Semantic Web world somehow ignores the reality that RDF must interact with existing applications (and please don't give me the RDFa story!)

I am an outsider of RDF or W3C. I came to RDF from Tinkerpop / Neo4J. Part of the reason I came to RDF is the academic thing that I prefer to describe as a lot of experience that are gathered in the same place with a lot of energy, an open process, a system that is well studied along various aspects, with several independent industrial implementations. RDF can prolly be perfected. Also, be warned that my system does aim to be 100% compliant with RDF! I cherry picked ideas (e.g. my system support SPARQL queries and Tinkerpop's Gremlin queries, they can mixed-and-matched)

in way that the common developer can use it.

FWIW, I do not think I match that description (e.g. I prefer to avoid ORMs), so take what I write with a grain of salt.

Is your goal to standardize read and write access to an RDF database, such as ORM do with SQL databases, in other words, build a framework to interop a RDF databases with Java re-using Java concepts?

If that is the case, I am not sure how it relates to this issue. Also quoting the other issue:

It turned out that RDF is not intelligible for the users and even not for engineers.

Who are the users? As far I as I am concerned catching up on 80% of what I know about RDF can be summarized with SPARQL, and this tutorial: https://docs.data.world/tutorials/sparql/.

My recommendation is to create a new issue with a specific question, e.g.: How to map RDF concepts to Java concepts?

@namedgraph
Copy link

@mhedenus ER models come from the RDBMSs which came along in the 70s or so.

The web on the other hand appeared in the 90s, and then RDF was designed for data interchange on the web. That's why it has URIs, the Open world assumption (OWA) etc.

So there is an inherent mismatch between those models, and trying to shoehorn one into the other will leave you with the worst of both.

To take full advantage of RDF you have to go fully in. Design your software around RDF, not the way around. Throw out the ORMs and pretty much all of the object-oriented layer. Accept that there are only triples (or quads), and they do not distinguish between properties or relations.

@mhedenus
Copy link

mhedenus commented Apr 26, 2021

@draggett

You can create an ontology using OWL and define classes like Man and Woman. And you can make assertions about OWL-Properties like :hasWife or :hasAge (see the OWL 2 primer for exactly this example). In PG-model or a UML model :hasWife would be a edge/link/association/relation and :hasAge would be a key/member/attribute. Can you define the difference in OWL ?

@mhedenus
Copy link

mhedenus commented Apr 26, 2021

@namedgraph

The line of argumentation is very odd and completly unrealistic. Saying that RDF is younger does not make the other things worthless. Also the remark on OWA and CWA is out of scope, this is something completly different. By the way: SHACL showed up because the reality is that you cannot live without validation and CWA. That's why Stardog introduced ICV!

So there is an inherent mismatch between those models, and trying to shoehorn one into the other will leave you with the worst of both.

The opposite: the best of both!

Design your software around RDF, not the way around. Throw out the ORMs and pretty much all of the object-oriented layer. Accept that there are only triples (or quads), and they do not distinguish between properties or relations.

I want to live in your world, it seems paradise! ;D

Do you drive a VW or Audi? Then it is very likey that the software in your car's enigne has been developed here in Regensburg. Please come and tell these engineers to forget what they learned about UML and that they shall restart with OWL
Tell the Java developers to forget about what they learned about model-driven-development and data-binding. They shall learn RDF and work with RDF4J, not this old-fashioned Hibernate stuff!

@namedgraph
Copy link

@mhedenus you haven't disputed that there's an inherent mismatch between the models. There's an impedance mismatch even between the relational and object-oriented models, that's why the ORMs have all kinds of edge cases.

I'm not telling anyone how to work, my concern is pushing the web to its full potential and making it data-driven by using RDF and declarative technologies. I am just sharing my experiences. We have explained them in more detail in our blog.

If you don't have time for that, at least take a look at a specification which enables generic REST APIs and makes web applications data-driven, or more specifically ontology-driven: https://atomgraph.github.io/Linked-Data-Templates/
It allowed us to get rid of the domain object model completely.

@mhedenus
Copy link

mhedenus commented Apr 26, 2021

@amirouche Developing a Java-graph mapping would be the final result of what I want to discuss here. Before that conceptual questions must be answered. I used this thread because I consider my obviously weird question about relation/properties as the key issue. If you can unify PG and RDF conceptually then a Java-graph mapper can be the realization!

@mhedenus
Copy link

mhedenus commented Apr 26, 2021

@namedgraph Looking at the links you provided that all looks great!

One thing should not be forgotten: We all want to advocate RDF!
However, my experience is that there are many obstacles: technical, epistemic and cultural.
We (my team) are struggeling where hard to explain to the management what data-centric means. You cannot believe how hard that is.

@namedgraph
Copy link

Amen! Do you know this book series BTW? Software Wasteland and The Data-Centric Revolution.

If you want to see our approach working in practice, drop me a line :) martynas [at] atomgraph.com

@mhedenus
Copy link

@namedgraph Thank you very much for the hints. The books look very interesting.

@amirouche
Copy link

I was going to reply something similar. I came to the realization that object mappers such as ORM / ODM / OGM are a pipe dream before diving into Scheme and RDF. It may have some use to describe a schema with a set of Java class with annotations in cases where there is no other way to do it.

there are many obstacles: technical, epistemic and cultural.

I have done that journey when I was younger, I started with a Java, UML, SQL, I still do my daily chore with an ORM. The physical barrier is bigger and stronger obstacle that those you mentioned (see also the software crisis).

Check out Apache Jena, I do not think there will be a better answer elsewhere.

@mhedenus
Copy link

mhedenus commented May 7, 2021

Thank you all for having this conversation. One outcome for me was that I have to present my position more precisely.
I have written an essay that I want to bring to your attention: https://github.com/mhedenus/on_graphs_and_models
Any comment is appreciated.

@dbooth-boston
Copy link
Collaborator

I have written an essay that I want to bring to your attention: https://github.com/mhedenus/on_graphs_and_models

I would find it helpful if you could precisely itemize the differences between what you are calling a "property" versus a "relation", so that I can understand the distinction you are trying to make. What is true of a "property" that is not true of a "relation", and vice versa? What can I do with a "property" that I cannot do with a "relation", and vice versa? What characteristics do "properties" have that "relations" do not have, and vice versa? How are "properties" written or depicted, in contrast with "relations", and vice versa? If you could provide a concise list of the differences, it would help.

@HughGlaser
Copy link
Collaborator

HughGlaser commented May 8, 2021 via email

@namedgraph
Copy link

@mhedenus you're still thinking in ER terms and as long as you do that you will be seeing some mismatch in RDF.
RDF is a directed labelled graph at the very basic level. Here's an example of a directed graph:
image
Do you see any difference between "properties" and "relations"? No, because there is none.

But in practical terms, what prevents you from defining

pg:Relation rdfs:subClassOf owl:ObjectProperty .

after which your example becomes

ex:name       a rdf:Property . # or rather owl:DatatypeProperty
ex:employedAs a rdf:Property . # or rather owl:ObjectProperty
ex:likes      a pg:Relation .

and you have distinction between "properties" and "relations" and it still makes sense semantically (unless I messed up the subclassing).

@mhedenus
Copy link

mhedenus commented May 10, 2021

Thank you all for reading my note and making this excellent remarks!

@HughGlaser Your summarised my thoughts very nicely. Maybe the term "expressivness" is misleading. It did not mean that either graph style is more powerful. As you said they both can model the world but they do it differently.

I will try to respond to your objections. They all come down to the questions: is the property/relation question is a real deep issue or is it a superflous pettifoggery of a shallow difference? If there are differences, can they be listed (are the sufficient arguments for beeig a property or relation?)

The answer may be a bit surprising. I used the term "graph style" for a reason. There is the concept of Thought Style. It is an important concept in the field of history of sciences, it is the basis of the concept Paradigm. To simplify it: you have a style of thinking that is shaped by your context.

I make a hypothesis: there are two thought styles here, the context of RDF/Linked Data and the context of Property Graph/Applied Mathematics. A member of the first groups says: "What are you talking about? There is no difference!" A member of the other group might say: "Why don't you see it?"

I do not want to convince you to adopt anything, but I do want you to accept that the difference property/relation is made by others. You cannot deny that ER or (UML) class models exist and they distinguish between membership and association.
I think that accepting this fact is the necessary condition for bridging PG and RDF. This is the topic of this thread, is it not ?
If you 'simply decide not to use the property stuff of the PG' (to cite @HughGlaser) than you are not supporting PG!
I also think that RDF must do something here because it provides the more general/abstract graph style.

So, I rephrase the question of @HughGlaser and @dbooth-boston:

IF you accept the difference between relation/property THEN how do you distinguish them?

Well, this is a very good question and the discussion can be extremly deep if not confusing (for example: intrinsic versus extrinsic properties). This is not the right place for such a discussion. Putting all philosophical questions aside I think finally it is choice that is made by the guys who create the model. (Whether or not it is a bad thing to have this choice is again another question!)

I would list following general rules:

  • We must distinguish three things: properties/relations and those things which are related and own properties.
    The best name of these things is in my opinion entity (like in the ER model). An entity is a thing we talk about, it is the object of the model.
    A necessary condition is that an entity has an identity (otherwise it could not a be thing we talk about). But this is not sufficient because there are things that have an identity but can be regarded as property (value), like an email address.
    It follows that 'entity' is a stronger term than RDF's 'resource'.

  • A relation is a relation in the mathematical sense between entities (an n-ary logical predicate r(x,y,z...), where x,y,z... are entities). Putting higher relations aside (this is again again another question) a 'relation' is a binary ordered (directed) relation of two entities.

  • A property does not have an existence or identity by its own, it is always owned by something else, either an entity or a reified relation. If the entity is destroyed, all owend properties are also destroyed. A property can have any complex value, it is not restricted to be a single string literal. It can be an URI, a structure or sub-graph.

  • A property cannot be related to something, only its owning entity.

@namedgraph : I am sorry, but RDF is not what you are showing. RDF is a labeled directed graph plus three nodes types: IRI, Blank and Literal. There is a restriction that limits Literals to be leave nodes. This additional feature changes the graph completly. If you define a owl:DatatypeProperty it says that the object-node of an RDF statement shall be a Literal (please correct me if I am wrong here). I believe, we agree that Literals are properties in the PG sense (I said this several times). But what about IRIs and Blanks? Can they be interpreted as properties in the PG sense? Can a whole sub-graph be a property value in the PG sense? So owl:DatatypeProperty and owl:ObjectProperty do not help here.

@namedgraph
Copy link

Can you use named graphs as "sub-graphs"?

And have you looked at other RDF -> PG mapping approaches? For example:
https://github.com/Rothamsted/rdf2pg#mapping-rdf-to-cypherneo4j-entities-general-concepts

@mhedenus
Copy link

mhedenus commented May 10, 2021

Named graphs as property value? I didn't think about it, but this is a very interesting idea. Why not?

The link you provided is good example for what I mean: you must specify a mapping. If you are on the PG side as the active consumer, the problem is only a technical one, because you know how to map. But there is no general solution without any additional information.

@mhedenus
Copy link

mhedenus commented May 10, 2021

I say the best way is to create a new vocabulary. Here is a sketch. I use prefixes to make clear what I mean:

  • Def: A pg:Entity is a rdf:IRI (but not vice versa); its type is an pg:EntityType
  • Def: A pg:RelationType is a subclass of rdf:Property with domain and rage pg:EntityType
  • Def A pg:Relation is a rdf:Statement with subject and object a pg:Entity and predicate a pg:RelationType
  • Def: A pg:PropertyType is a subclass of rdf:Property with domain pg:EntityType
  • Def: A pg:Property is a pair (pg:PropertyType, value). A rdf:Statment is interpreted as pg:Property if the predicate is a pg:PropertyType or the object is a rdf:Literal or Blank Node (indicating a complex property value)
  • pg:Properties can be assigned to reified rdf:Statments

Example:

ex:Alice a ex:Person ;
   ex:name "Alice";
   ex:likes ex:Bob ;
   ex:employedAs ex:Scientist .

ex:Bob a ex:Person .

# new Property-Graph Ontology

ex:Alice a pg:Entity .
#OR
ex:Person a pg:EntityType . 


ex:name a pg:PropertyType . # can be inferred ?
ex:likes a pg:RelationType.  #  can be inferred ?
ex:employedAs a pg:PropertyType # this cannot be inferred and must be asserted! 

Now imagine an PG visualization application that parses this RDF. There should not be any problem.

I would be happy if the W3C would adopt this initiative and develop a Property Graph Ontology.

@pchampin
Copy link

@mhedenus

I say the best way is to create a new vocabulary.

I say an even better way is to reuse one that already exists :-)
https://ieeexplore.ieee.org/abstract/document/9115617 (ping @domel)

+1 to what you wrote about 'Thought styles'. Each paradigm has some features "baked-in" (property-relationship distinction for PG, unique identifiers for RDF...) because they were considered essential by the community in which they appeared. Other features can always be added through extra layers (a conventional 'iri' property on each node for PG, a meta-ontology in RDF such as the one you proposed above). In the end, as @HughGlaser points out, the expressiveness is roughly the same, but the trade-offs differ.

NB: another place where the property/relationship distinction can be made is in the visualization layer. Great example at https://vitalis-wiens.github.io/donatello-pipelines/

@mhedenus
Copy link

@pchampin

That paper looks very interesting! It seems their focus is on transforming PG to RDF. For me the focus would be on the other direction RDF to PG!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Category: language features For language features of RDF itself -- model and syntax higher-level Higher-level RDF should address this standards Standardization should address this
Projects
None yet
Development

No branches or pull requests

9 participants