-
Notifications
You must be signed in to change notification settings - Fork 574
Small but nice SPARQL Optimisation fix #547
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
As far as I can see the single failing test |
It breaks while trying to parse the reply to the query produced from this: https://github.com/RDFLib/rdflib/blob/master/test/test_sparqlupdatestore.py#L237 I don't know where it goes wrong - on insert or query. |
@@ -214,7 +214,10 @@ def evalPart(ctx, part): | |||
pass # the given custome-function did not handle this part | |||
|
|||
if part.name == 'BGP': | |||
return evalBGP(ctx, part.triples) # NOTE pass part.triples, not part! | |||
# Reorder triples patterns based on 'bindedness' based on current ctx | |||
triples = sorted(part.triples, key=lambda t: len([n for n in t if ctx[n] is None])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i have problems understanding this... could you maybe expand the comment a bit more?
my guess is n
is a node in triple t
, so if ctx[n] is None
then n
is unbound, so you select those triples first which have least unbound nodes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your guess is spot on. I am not sure when this kicks in, for only "static" triples (with not optional/bindings/values/anything) - the code here: https://github.com/RDFLib/rdflib/blob/master/rdflib/plugins/sparql/algebra.py#L84 will already have sorted them correctly.
This code helps when some other source of bindings has given us more bindings since then - in my base it was using initBindings.
btw: the failing test seems related to the recent SPARQLWrapper updates, more in #550 |
This reorders the triples in a BGP while evaluating, so that the "most bound" triples are evaluated first. The algebra transformation also does this, but "statically", i.e. without looking at the data. At runtime, we may have bindings from other places (BIND/VALUES/initBindings) which can massively improve query time. In a BerkelyDB store with ~250k triples, I had a query with two triples patterns that retrieved 38 out of 25000 possible matching triples with a particular property. Adding this fix brought query time from 8s to 800ms.
I made the comment a bit clearer - if the test-fail is unrelated we can merge :) |
Small but nice SPARQL Optimisation fix
This reorders the triples in a BGP while evaluating, so that the "most
bound" triples are evaluated first. The algebra transformation also does
this, but "statically", i.e. without looking at the data.
At runtime, we may have bindings from other places
(BIND/VALUES/initBindings) which can massively improve query time.
In a BerkelyDB store with ~250k triples, I had a query with two triples
patterns that retrieved 38 out of 25000 possible matching triples with a
particular property. Adding this fix brought query time from 8s to
800ms.