Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
![](docs/_static/RDFlib.png)

RDFLib

======
[![Build Status](https://github.com/RDFLib/rdflib/actions/workflows/validate.yaml/badge.svg?branch=master)](https://github.com/RDFLib/rdflib/actions?query=branch%3Amaster)
[![Coveralls branch](https://img.shields.io/coveralls/RDFLib/rdflib/master.svg)](https://coveralls.io/r/RDFLib/rdflib?branch=master)
Expand Down
64 changes: 64 additions & 0 deletions rdflib/graph.py
Original file line number Diff line number Diff line change
Expand Up @@ -1597,6 +1597,70 @@ def add_to_cbd(uri):

return subgraph

def describe_cbd(self, resource, subgraph):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is unclear from the docstrings why this would exist in addition to Graph.cbd as from a brief glance the docstrings look the same. I don't think we need two methods to get the CBD, we just need one that works.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See:

Also you will have to include tests specifically for this function for this PR to be merged, but still I don't think it should exist, if Graph.cbd does not do what it says in the docstring it is a bug and should be fixed.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey!

Thanks for your review.

We have included the tests for this describe_cbd function in the test file test_graph_cbd.py as the testCbdDescribeReified() function. The test example which we have included will be able to cover all the cases for the definition of DESCRIBE query. The test example can be visualized as:-
Screen Shot 2022-05-12 at 10 14 53 PM

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function describe_cbd is a variant of the original CBD implementation. I have modified the function to recursively consider all triples containing the query resource as an object and subject. The original function returns the CBD containing triples where the resource is a subject only. However, for implementing the DESCRIBE query, we need all triples containing the resource – both as an object and subject. We also ensure that the blank nodes are recursively resolved – this is in line with the definition of DESCRIBE query.

Copy link
Member

@aucampia aucampia May 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have included the tests for this describe_cbd function in the test file test_graph_cbd.py as the testCbdDescribeReified() function.

My appologies, in that case you will need tests for SPARQL DESCRIBE as part of the PR.

The function describe_cbd is a variant of the original CBD implementation. I have modified the function to recursively consider all triples containing the query resource as an object and subject. The original function returns the CBD containing triples where the resource is a subject only.

If it does something different then it should at least have a different docstring which it does not as far as I can tell, and then the next question is, why does it do something different? There is one definition of a Concise Bounded Description, and it is this: https://www.w3.org/Submission/CBD/#definition - if it does something different it should be further qualified, and as SPARQL DESCRIBE does not mandate anything in regards to what should be returned I don't think qualifying it with describe is helpful.

However, for implementing the DESCRIBE query, we need all triples containing the resource – both as an object and subject. We also ensure that the blank nodes are recursively resolved – this is in line with the definition of DESCRIBE query.

Can you clarify why? As per the SPARQL 1.1 spec:

https://www.w3.org/TR/2013/REC-sparql11-query-20130321/#describe

The DESCRIBE form returns a single result RDF graph containing RDF data about resources. This data is not prescribed by a SPARQL query, where the query client would need to know the structure of the RDF in the data source, but, instead, is determined by the SPARQL query processor.

And goes on to say:

Other possible mechanisms for deciding what information to return include Concise Bounded Descriptions [CBD].

It does not state that DESCRIBE must return the CBD, but I do expect that if we decide to return the Concise Bounded Description that we return it as defined, in which case Graph.cbd should do it, and if it does not should be fixed to do it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not state that DESCRIBE must return the CBD

DESCRIBE should not return just the CBD, because as we mentioned below

However, for implementing the DESCRIBE query, we need all triples containing the resource – both as an object and subject.

This is in line with the definition of the DESCRIBE query.

If it does something different then it should at least have a different docstring which it does not as far as I can tell

Apologies for not providing a separate docstring, I can add that to make it well documented. A few things here and there can be improved wrt documentation, I can check that out.

Further,

in that case, you will need tests for SPARQL DESCRIBE as part of the PR.

I have also provided the link to a testing file that has such an example.

To show the working of DESCRIBE query, we have provided a tester file.

If that matches what you expect, we can add that file to the PR. For now, we have just attached the link.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DESCRIBE should not return just the CBD, because as we mentioned below

However, for implementing the DESCRIBE query, we need all triples containing the resource – both as an object and subject.

But I don't understand why, where does this requirement come from, could you cite some specification that requires this behavior or indicates why you say it needs "all triples containing the resource – both as an object and subject."? If this is just what you as an individual would prefer I don't think it is sufficient justification.

I have also provided the link to a testing file that has such an example.

It has to be in our test suite, if it is not then I don't know if it passes (as I won't run it on my computer) and we also can't know if there is a regression.

Copy link
Member

@aucampia aucampia May 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for not providing a separate docstring, I can add that to make it well documented. A few things here and there can be improved wrt documentation, I can check that out.

To begin with, it will be best to just cite what specification dictates this algorithm/behaviour.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fwiw, the result of a "DESCRIBE" query is determined by the service and the definition is sufficiently vague as to prompt the W3 to set description in double quotes:

“The DESCRIBE form takes each of the resources identified in a solution, together with any resources directly named by IRI, and assembles a single RDF graph by taking a "description" which can come from any information available including the target RDF Dataset. The description is determined by the query service.”

The EBNF for SPARQL DESCRIBE suggests that an implementation that conforms to the spec is going to demand rather more than just defaulting to returning a cbd:

DescribeQuery | ::= | "DESCRIBE" ( VarOrIri+  \| "*")  DatasetClause*  WhereClause?  SolutionModifier

"""Retrieves the Concise Bounded Description of a Resource from a Graph
Concise Bounded Description (CBD) is defined in [1] as:
Given a particular node (the starting node) in a particular RDF graph (the source graph), a subgraph of that
particular graph, taken to comprise a concise bounded description of the resource denoted by the starting node,
can be identified as follows:
1. Include in the subgraph all statements in the source graph where the subject of the statement is the
starting node;
2. Recursively, for all statements identified in the subgraph thus far having a blank node object, include
in the subgraph all statements in the source graph where the subject of the statement is the blank node
in question and which are not already included in the subgraph.
3. Recursively, for all statements included in the subgraph thus far, for all reifications of each statement
in the source graph, include the concise bounded description beginning from the rdf:Statement node of
each reification.
This results in a subgraph where the object nodes are either URI references, literals, or blank nodes not
serving as the subject of any statement in the graph.
[1] https://www.w3.org/Submission/CBD/
:param resource: a URIRef object, of the Resource for queried for
:return: a Graph, subgraph of self
"""

def add_to_cbd_original(uri):
for s, p, o in self.triples((uri, None, None)):
subgraph.add((s, p, o))
# recurse 'down' through ll Blank Nodes
if type(o) == BNode and not (o, None, None) in subgraph:
add_to_cbd_original(o)

# for Rule 3 (reification)
# for any rdf:Statement in the graph with the given URI as the object of rdf:subject,
# get all triples with that rdf:Statement instance as subject

# find any subject s where the predicate is rdf:subject and this uri is the object
# (these subjects are of type rdf:Statement, given the domain of rdf:subject)
for s, p, o in self.triples((None, RDF.subject, uri)):
# find all triples with s as the subject and add these to the subgraph
for s2, p2, o2 in self.triples((s, None, None)):
subgraph.add((s2, p2, o2))

def add_to_cbd_reverse(uri):
for s, p, o in self.triples((None, None, uri)):

subgraph.add((s, p, o))
# recurse 'down' through ll Blank Nodes
if type(s) == BNode and not (s, None, None) in subgraph:
add_to_cbd_reverse(s)

# for Rule 3 (reification)
# for any rdf:Statement in the graph with the given URI as the object of rdf:subject,
# get all triples with that rdf:Statement instance as subject

# find any object s where the predicate is rdf:object and this uri is the object
# (these subjects are of type rdf:Statement, given the domain of rdf:subject)
for s, p, o in self.triples((None, RDF.object, uri)):
# find all triples with s as the subject and add these to the subgraph
for s2, p2, o2 in self.triples((s, None, None)):

subgraph.add((s2, p2, o2))

add_to_cbd_original(resource)
add_to_cbd_reverse(resource)

return subgraph


class ConjunctiveGraph(Graph):
"""A ConjunctiveGraph is an (unnamed) aggregation of all the named
Expand Down
5 changes: 4 additions & 1 deletion rdflib/plugins/sparql/algebra.py
Original file line number Diff line number Diff line change
Expand Up @@ -777,13 +777,16 @@ def translateQuery(q, base=None, initNs=None):
q[1], visitPost=functools.partial(translatePName, prologue=prologue)
)

P, PV = translate(q[1])
if q[1].name != "DescribeQuery":
P, PV = translate(q[1])
datasetClause = q[1].datasetClause
if q[1].name == "ConstructQuery":

template = triples(q[1].template) if q[1].template else None

res = CompValue(q[1].name, p=P, template=template, datasetClause=datasetClause)
elif q[1].name == "DescribeQuery":
res = CompValue(q[1].name, p=None, datasetClause=datasetClause, PV=q[1].var)
else:
res = CompValue(q[1].name, p=P, datasetClause=datasetClause, PV=PV)

Expand Down
7 changes: 6 additions & 1 deletion rdflib/plugins/sparql/evaluate.py
Original file line number Diff line number Diff line change
Expand Up @@ -294,7 +294,12 @@ def evalPart(ctx: QueryContext, part: CompValue):
# raise Exception('ServiceGraphPattern not implemented')

elif part.name == "DescribeQuery":
raise Exception("DESCRIBE not implemented")
subgraph = Graph()
for var in part['PV']:
subgraph = ctx.graph.describe_cbd(var, subgraph)
res = {"type_": "DESCRIBE", "graph": subgraph}

return res

else:
raise Exception("I dont know: %s" % part.name)
Expand Down
89 changes: 89 additions & 0 deletions test/test_graph/test_graph_cbd.py
Original file line number Diff line number Diff line change
Expand Up @@ -109,3 +109,92 @@ def testCbdReified(get_graph):
)

assert len(g.cbd(EX.R6)) == (3 + 5 + 5), "cbd() for R6 should return 12 triples"


def testCbdDescribeReified(get_graph):
g = get_graph
# Checking DESCRIBE query
g.parse(
data="""
PREFIX ex: <http://ex/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
ex:R5
ex:propOne ex:P1 ;
ex:propTwo ex:P2 ;
ex:propRei ex:Pre1 .
ex:S
a rdf:Statement ;
rdf:subject ex:R5 ;
rdf:predicate ex:propRei ;
rdf:object ex:Pre1 ;
ex:otherReiProp ex:Pre2 ;
ex:newProp ex:R5 .

ex:R2
ex:propp ex:R5 .
ex:S3
ex:propbn1 _:b1.
_:b1
ex:propbn2 ex:R5;
ex:probbn3 _:b2.
_:b2
ex:probbn4 ex:R5.
ex:R5
ex:propbn4 _:b3.
_:b3
ex:probbn5 _:b4;
ex:probbn4 ex:R6.
_:b4
ex:probbn8 ex:R7.
""",
format="turtle",
)
# print(len(g.describe_cbd(EX.R5,Graph())))
# this cbd() call should get the 3 basic triples with ex:R5 as subject as well as 5 more from the reified
# statement
assert len(g.describe_cbd(EX.R5, Graph())) == (
3 + 5 + 8
), "describe_cbd() for R5 should return 16 triples"

# add crazy reified triples to the testing graph
g.parse(
data="""
PREFIX ex: <http://ex/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
ex:R6
ex:propOne ex:P1 ;
ex:propTwo ex:P2 ;
ex:propRei ex:Pre1 .
ex:S1
a rdf:Statement ;
rdf:subject ex:R6 ;
rdf:predicate ex:propRei ;
rdf:object ex:Pre1 ;
ex:otherReiProp ex:Pre3 .
ex:S2
rdf:subject ex:R6 ;
rdf:predicate ex:propRei2 ;
rdf:object ex:Pre2 ;
ex:otherReiProp ex:Pre4 ;
ex:otherReiProp ex:Pre5 .
ex:S3
ex:propbn1 _:b1.
_:b1
ex:propbn2 ex:R5;
ex:probbn3 _:b2.
_:b2
ex:probbn4 ex:R5.
ex:R5
ex:propbn4 _:b3.
_:b3
ex:probbn5 _:b4;
ex:probbn4 ex:R6.
_:b4
ex:probbn8 ex:R7.
""",
format="turtle",
)
# print(len(g.describe_cbd(EX.R6)))
assert len(g.describe_cbd(EX.R6, Graph())) == (
3 + 5 + 5 + 2
), "describe_cbd() for R6 should return 15 triples"