Small but nice SPARQL Optimisation fix #547

gromgull · 2015-11-20T19:29:05Z

This reorders the triples in a BGP while evaluating, so that the "most
bound" triples are evaluated first. The algebra transformation also does
this, but "statically", i.e. without looking at the data.

At runtime, we may have bindings from other places
(BIND/VALUES/initBindings) which can massively improve query time.

In a BerkelyDB store with ~250k triples, I had a query with two triples
patterns that retrieved 38 out of 25000 possible matching triples with a
particular property. Adding this fix brought query time from 8s to
800ms.

gromgull · 2015-11-20T21:04:43Z

As far as I can see the single failing test test_sparqlupdatestore has nothing to do with what I edited :(

gromgull · 2015-11-20T21:09:59Z

It breaks while trying to parse the reply to the query produced from this: https://github.com/RDFLib/rdflib/blob/master/test/test_sparqlupdatestore.py#L237

I don't know where it goes wrong - on insert or query.

joernhees · 2015-11-22T01:56:13Z

rdflib/plugins/sparql/evaluate.py

@@ -214,7 +214,10 @@ def evalPart(ctx, part):
            pass  # the given custome-function did not handle this part

    if part.name == 'BGP':
-        return evalBGP(ctx, part.triples)  # NOTE pass part.triples, not part!
+        # Reorder triples patterns based on 'bindedness' based on current ctx
+        triples = sorted(part.triples, key=lambda t: len([n for n in t if ctx[n] is None]))


i have problems understanding this... could you maybe expand the comment a bit more?

my guess is n is a node in triple t, so if ctx[n] is None then n is unbound, so you select those triples first which have least unbound nodes?

Your guess is spot on. I am not sure when this kicks in, for only "static" triples (with not optional/bindings/values/anything) - the code here: https://github.com/RDFLib/rdflib/blob/master/rdflib/plugins/sparql/algebra.py#L84 will already have sorted them correctly.

This code helps when some other source of bindings has given us more bindings since then - in my base it was using initBindings.

joernhees · 2015-11-22T01:57:27Z

btw: the failing test seems related to the recent SPARQLWrapper updates, more in #550

This reorders the triples in a BGP while evaluating, so that the "most bound" triples are evaluated first. The algebra transformation also does this, but "statically", i.e. without looking at the data. At runtime, we may have bindings from other places (BIND/VALUES/initBindings) which can massively improve query time. In a BerkelyDB store with ~250k triples, I had a query with two triples patterns that retrieved 38 out of 25000 possible matching triples with a particular property. Adding this fix brought query time from 8s to 800ms.

gromgull · 2015-11-22T08:41:28Z

I made the comment a bit clearer - if the test-fail is unrelated we can merge :)

Small but nice SPARQL Optimisation fix

joernhees mentioned this pull request Nov 22, 2015

master build fails (guess: dependency updates) #550

Closed

joernhees reviewed Nov 22, 2015
View reviewed changes

joernhees added enhancement New feature or request SPARQL performance labels Nov 22, 2015

joernhees added this to the rdflib 4.2.2 milestone Nov 22, 2015

gromgull force-pushed the master branch from ff41b74 to cf9ccd9 Compare November 22, 2015 08:40

joernhees added a commit that referenced this pull request Nov 22, 2015

Merge pull request #547 from gromgull/master

fa4172c

Small but nice SPARQL Optimisation fix

joernhees merged commit fa4172c into RDFLib:master Nov 22, 2015

pyup-bot mentioned this pull request Jan 29, 2017

Update rdflib to 4.2.2 mytardis/mytardis#815

Merged

This was referenced Mar 16, 2017

Initial Update mozilla/amo-validator#510

Closed

Update rdflib to 4.2.2 mozilla/amo-validator#515

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Small but nice SPARQL Optimisation fix #547

Small but nice SPARQL Optimisation fix #547

Uh oh!

gromgull commented Nov 20, 2015

Uh oh!

gromgull commented Nov 20, 2015

Uh oh!

gromgull commented Nov 20, 2015

Uh oh!

joernhees Nov 22, 2015

Uh oh!

gromgull Nov 22, 2015

Uh oh!

joernhees commented Nov 22, 2015

Uh oh!

gromgull commented Nov 22, 2015

Uh oh!

Uh oh!

Small but nice SPARQL Optimisation fix #547

Small but nice SPARQL Optimisation fix #547

Uh oh!

Conversation

gromgull commented Nov 20, 2015

Uh oh!

gromgull commented Nov 20, 2015

Uh oh!

gromgull commented Nov 20, 2015

Uh oh!

joernhees Nov 22, 2015

Choose a reason for hiding this comment

Uh oh!

gromgull Nov 22, 2015

Choose a reason for hiding this comment

Uh oh!

joernhees commented Nov 22, 2015

Uh oh!

gromgull commented Nov 22, 2015

Uh oh!

Uh oh!