Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial implementation of SpatialJoin #1248

Merged
merged 63 commits into from
Sep 28, 2024
Merged
Show file tree
Hide file tree
Changes from 59 commits
Commits
Show all changes
63 commits
Select commit Hold shift + click to select a range
0fe8bd2
merge changes with new qlever version
Jonathan24680 Jan 25, 2024
4497ae5
merge adaptions
Jonathan24680 Jan 25, 2024
1b49817
maxDistance can be given as parameter
Jonathan24680 Jan 25, 2024
16381e7
cleaning up
Jonathan24680 Jan 28, 2024
6084f6d
Merge branch 'ad-freiburg:master' into SpatialJoinClass
Jonathan24680 Jan 28, 2024
522cef2
cleaning up
Jonathan24680 Jan 28, 2024
95e39ec
Merge branch 'master' into SpatialJoinClass
Jonathan24680 Feb 4, 2024
22e3276
Merge branch 'master' into SpatialJoinClass
Jonathan24680 Feb 9, 2024
16e47a6
move the spatialJoin constant to the constant file and cleaning up
Jonathan24680 Feb 9, 2024
6783ddb
backup commit - work in progress
Jonathan24680 Feb 16, 2024
2f1146f
add creation of test dataset
Jonathan24680 Feb 22, 2024
d40b5c1
backup commit
Jonathan24680 Feb 23, 2024
f49918f
backup commit
Jonathan24680 Mar 5, 2024
0ca9bbe
backup commit
Jonathan24680 Mar 5, 2024
04af121
backup commit
Jonathan24680 Mar 6, 2024
4a79fc3
test of computeResult in the SpatialJoin
Jonathan24680 Mar 15, 2024
d202972
forgot to save some edits
Jonathan24680 Mar 15, 2024
0f55a78
maxDistance Parsing tests
Jonathan24680 Mar 15, 2024
fd0dae9
backup
Jonathan24680 Mar 16, 2024
a76bbb1
probably overflow distance mistake somewhere
Jonathan24680 Mar 18, 2024
df7084b
added many tests
Jonathan24680 Mar 19, 2024
b6add0d
added knownEmptyResultTest
Jonathan24680 Mar 20, 2024
12435ce
SpatialJoin getDescriptor and getCacheKeyImpl and tests
Jonathan24680 May 5, 2024
87c42a8
backup commit with testing the multiplicity of the indexscan
Jonathan24680 Jun 1, 2024
d5913fe
multiplicities and sizeEstimate implementation and test
Jonathan24680 Jul 11, 2024
cbce5bf
improved multiplicity and size estimate tests added sortedOn test
Jonathan24680 Jul 12, 2024
6656863
merge conflicts solved
Jonathan24680 Jul 17, 2024
d2d79bc
resolve merge conflicts
Jonathan24680 Jul 17, 2024
10ab4d5
fix bug with adding children
Jonathan24680 Jul 17, 2024
36e7ccd
clang style
Jonathan24680 Jul 18, 2024
b6ecb85
remove debug statements and comments
Jonathan24680 Jul 18, 2024
46ae079
fix bug with new addChild method
Jonathan24680 Jul 18, 2024
97ffea8
clang and sonarcube issues for QueryPlanner
Jonathan24680 Jul 18, 2024
f0f6ad2
Merge branch 'master' into SpatialJoinClass
joka921 Jul 19, 2024
c21db7f
solve some sonarqube issues
Jonathan24680 Jul 23, 2024
076288e
merge request changes
Jonathan24680 Jul 25, 2024
ec18b02
clang
Jonathan24680 Jul 25, 2024
702b0c4
PR points during the meeting
Jonathan24680 Jul 26, 2024
42439f9
fix bug when childs have more columns in resulttable than in variable…
Jonathan24680 Aug 1, 2024
d672f8d
refactoring and sonarcube issues
Jonathan24680 Aug 11, 2024
345e369
refactoring of getMultiplicity
Jonathan24680 Aug 11, 2024
2e1f7db
remove development print statement
Jonathan24680 Aug 11, 2024
5cc5fba
first QueryPlannerTest
Jonathan24680 Aug 20, 2024
9e083a1
solve merge conflict
Jonathan24680 Aug 20, 2024
7b964f7
add additional QueryPlannerTests
Jonathan24680 Aug 20, 2024
d90ea82
Sonarqube issues
Jonathan24680 Aug 23, 2024
707cc32
change array to vector because of pipeline mistake
Jonathan24680 Aug 23, 2024
53dfcf5
sonarqube and merge request changes
Jonathan24680 Aug 25, 2024
d33ac37
merge request comments on mostly SpatialJoin cpp
Jonathan24680 Aug 26, 2024
b64e4a2
clang-format
Jonathan24680 Aug 26, 2024
a91456f
Sonarqube
Jonathan24680 Aug 26, 2024
35240bd
merge request comments
Jonathan24680 Aug 27, 2024
884ceb5
improving tests
Jonathan24680 Sep 9, 2024
81c09a2
format check und typo
Jonathan24680 Sep 9, 2024
a1e9dd1
remove unneeded code
Jonathan24680 Sep 20, 2024
ece9b0b
Merge branch 'master' into SpatialJoinClass
Jonathan24680 Sep 20, 2024
dca72ac
compile error changes
Jonathan24680 Sep 21, 2024
ef95bb3
format and namespace
Jonathan24680 Sep 21, 2024
40ffa04
remove unused function
Jonathan24680 Sep 21, 2024
84ae818
Merge branch 'master' into SpatialJoinClass
joka921 Sep 27, 2024
63fadec
Merge branch 'master' into SpatialJoinClass
joka921 Sep 27, 2024
2f263f0
Fix the merge with the old bla.
joka921 Sep 27, 2024
de2c3f4
Update SpatialJoin.cpp
joka921 Sep 27, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion src/engine/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -12,5 +12,5 @@ add_library(engine
Values.cpp Bind.cpp Minus.cpp RuntimeInformation.cpp CheckUsePatternTrick.cpp
VariableToColumnMap.cpp ExportQueryExecutionTrees.cpp
CartesianProductJoin.cpp TextIndexScanForWord.cpp TextIndexScanForEntity.cpp
TextLimit.cpp LocalVocabEntry.cpp LazyGroupBy.cpp)
TextLimit.cpp LocalVocabEntry.cpp LazyGroupBy.cpp SpatialJoin.cpp)
qlever_target_link_libraries(engine util index parser sparqlExpressions http SortPerformanceEstimator Boost::iostreams)
1 change: 0 additions & 1 deletion src/engine/Join.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -302,7 +302,6 @@ void Join::computeSizeEstimateAndMultiplicities() {
}
_multiplicities.emplace_back(m);
}

assert(_multiplicities.size() == getResultWidth());
}

Expand Down
64 changes: 64 additions & 0 deletions src/engine/QueryPlanner.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@
#include "engine/OrderBy.h"
#include "engine/Service.h"
#include "engine/Sort.h"
#include "engine/SpatialJoin.h"
#include "engine/TextIndexScanForEntity.h"
#include "engine/TextIndexScanForWord.h"
#include "engine/TextLimit.h"
Expand Down Expand Up @@ -691,6 +692,14 @@
"necessary also rebuild the index.");
}

const auto& input = node.triple_.p_._iri;
if (input.starts_with(MAX_DIST_IN_METERS) &&
input[input.size() - 1] == '>') {
pushPlan(makeSubtreePlan<SpatialJoin>(_qec, node.triple_, std::nullopt,
std::nullopt));
continue;
}

if (node.triple_.p_._iri == HAS_PREDICATE_PREDICATE) {
pushPlan(makeSubtreePlan<HasPredicateScan>(_qec, node.triple_));
continue;
Expand Down Expand Up @@ -1620,6 +1629,17 @@
return candidates;
}

// if one of the inputs is the spatial join and the other input is compatible
// with the SpatialJoin, add it as a child to the spatialJoin. As unbound
// SpatialJoin operations are incompatible with normal join operations, we
// return immediately instead of creating a normal join below as well.
// Note, that this if statement should be evaluated first, such that no other
// join options get considered, when one of the candidates is a SpatialJoin.
if (auto opt = createSpatialJoin(a, b, jcs)) {
candidates.push_back(std::move(opt.value()));
return candidates;
}

if (a.type == SubtreePlan::MINUS) {
AD_THROW(
"MINUS can only appear after"
Expand Down Expand Up @@ -1700,6 +1720,50 @@
return candidates;
}

// _____________________________________________________________________________
auto QueryPlanner::createSpatialJoin(
const SubtreePlan& a, const SubtreePlan& b,
const std::vector<std::array<ColumnIndex, 2>>& jcs)
-> std::optional<SubtreePlan> {
auto aIsSpatialJoin =
std::dynamic_pointer_cast<const SpatialJoin>(a._qet->getRootOperation());
auto bIsSpatialJoin =
std::dynamic_pointer_cast<const SpatialJoin>(b._qet->getRootOperation());

auto aIs = static_cast<bool>(aIsSpatialJoin);
auto bIs = static_cast<bool>(bIsSpatialJoin);

// Ecactly one of the inputs must be a SpatialJoin.
if ((aIs && bIs) || (!aIs && !bIs)) {
joka921 marked this conversation as resolved.
Show resolved Hide resolved
return std::nullopt;
}

const SubtreePlan& spatialSubtreePlan = aIsSpatialJoin ? a : b;
const SubtreePlan& otherSubtreePlan = aIsSpatialJoin ? b : a;

std::shared_ptr<Operation> op = spatialSubtreePlan._qet->getRootOperation();
auto spatialJoin = static_cast<SpatialJoin*>(op.get());

if (spatialJoin->isConstructed()) {
return std::nullopt;

Check warning on line 1748 in src/engine/QueryPlanner.cpp

View check run for this annotation

Codecov / codecov/patch

src/engine/QueryPlanner.cpp#L1748

Added line #L1748 was not covered by tests
}
Comment on lines +1774 to +1776
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this.
If the spatial join is complete, then it should have no more open variables, and thus we never reach this code.
Should this be an assertion, (AD_CORRECTNESS_CHECK), because I don't see how this can be triggered.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah,
I now understand this...
You don't need isConstructed here,
but you need "is the single variabel that I want to bind here already bound"..

Considder for example

SELECT * { 
?x <p> ?y .  #1
?x2 <p2> ?y. #2
?y <maxDist:5> ?z. 
# irgendwas mit Z.
 }

The query planner has to pick one of the triples 1 or 2 as the left side of the spatial join. I think currently is allowed by this code to bind both of them. And this wouldn't be correct (one of the triples would be ignored/overwritten).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it's not necessary to check, if the singe variable i want to bind is already bound because of the following reason: as soon as this one variable, here y, has been added, the variableToColumnMap returns one column, which only contains z. Therefore, the QueryPlanner can only join it with z variables. The behavior* of the VariableToColumnMap prevents this case.

*Behavior of VariableToColumnMap: If no child has been added, it returns the two variables of its children in it's map. As soon as one child has been added, it returns only one column, containing the missing child. When both children are finally added, it returns the "true" VariableToColumnMap, which contains the merged results from it's children


if (jcs.size() > 1) {
AD_THROW(
"Currently, if both sides of a SpatialJoin are variables, then the"
"SpatialJoin must be the only connection between these variables");
}
ColumnIndex ind = aIsSpatialJoin ? jcs[0][1] : jcs[0][0];
joka921 marked this conversation as resolved.
Show resolved Hide resolved
const Variable& var =
otherSubtreePlan._qet->getVariableAndInfoByColumnIndex(ind).first;

auto newSpatialJoin = spatialJoin->addChild(otherSubtreePlan._qet, var);

SubtreePlan plan = makeSubtreePlan<SpatialJoin>(std::move(newSpatialJoin));
mergeSubtreePlanIds(plan, a, b);
return plan;
}

Jonathan24680 marked this conversation as resolved.
Show resolved Hide resolved
// __________________________________________________________________________________________________________________
auto QueryPlanner::createJoinWithTransitivePath(
SubtreePlan a, SubtreePlan b,
Expand Down
7 changes: 7 additions & 0 deletions src/engine/QueryPlanner.h
Original file line number Diff line number Diff line change
Expand Up @@ -339,6 +339,13 @@ class QueryPlanner {
[[nodiscard]] static std::optional<SubtreePlan> createSubtreeWithService(
const SubtreePlan& a, const SubtreePlan& b);

// if one of the inputs is a spatial join which is compatible with the other
// input, then add that other input to the spatial join as a child instead of
// creating a normal join.
[[nodiscard]] static std::optional<SubtreePlan> createSpatialJoin(
const SubtreePlan& a, const SubtreePlan& b,
const std::vector<std::array<ColumnIndex, 2>>& jcs);

[[nodiscard]] vector<SubtreePlan> getOrderByRow(
const ParsedQuery& pq,
const std::vector<std::vector<SubtreePlan>>& dpTab) const;
Expand Down
Loading
Loading