Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPARQL Update Part 2: DeltaTriples #1429

Merged
merged 13 commits into from
Oct 4, 2024
3 changes: 2 additions & 1 deletion src/index/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,5 +5,6 @@ add_library(index
LocatedTriples.cpp Permutation.cpp TextMetaData.cpp
DocsDB.cpp FTSAlgorithms.cpp
PrefixHeuristic.cpp CompressedRelation.cpp
PatternCreator.cpp)
PatternCreator.cpp
DeltaTriples.cpp)
qlever_target_link_libraries(index util parser vocabulary compilationInfo ${STXXL_LIBRARIES})
189 changes: 189 additions & 0 deletions src/index/DeltaTriples.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,189 @@
// Copyright 2023 - 2024, University of Freiburg
// Chair of Algorithms and Data Structures.
// Authors:
// 2023 Hannah Bast <bast@cs.uni-freiburg.de>
// 2024 Julian Mundhahs <mundhahj@tf.uni-freiburg.de>

#include "index/DeltaTriples.h"

#include "absl/strings/str_cat.h"
#include "index/Index.h"
#include "index/IndexImpl.h"
#include "index/LocatedTriples.h"
#include "parser/TurtleParser.h"

// ____________________________________________________________________________
LocatedTriples::iterator& DeltaTriples::LocatedTripleHandles::forPermutation(
Permutation::Enum permutation) {
switch (permutation) {
case Permutation::PSO:
return forPSO_;
case Permutation::POS:
return forPOS_;
case Permutation::SPO:
return forSPO_;
case Permutation::SOP:
return forSOP_;
case Permutation::OSP:
return forOSP_;
case Permutation::OPS:
return forOPS_;
default:
AD_FAIL();

Check warning on line 32 in src/index/DeltaTriples.cpp

View check run for this annotation

Codecov / codecov/patch

src/index/DeltaTriples.cpp#L32

Added line #L32 was not covered by tests
Qup42 marked this conversation as resolved.
Show resolved Hide resolved
}
}

// ____________________________________________________________________________
void DeltaTriples::clear() {
triplesInserted_.clear();
triplesDeleted_.clear();
locatedTriplesPerBlockInPSO_.clear();
locatedTriplesPerBlockInPOS_.clear();
locatedTriplesPerBlockInSPO_.clear();
locatedTriplesPerBlockInSOP_.clear();
locatedTriplesPerBlockInOSP_.clear();
locatedTriplesPerBlockInOPS_.clear();
Qup42 marked this conversation as resolved.
Show resolved Hide resolved
}

// ____________________________________________________________________________
std::vector<DeltaTriples::LocatedTripleHandles>
DeltaTriples::locateAndAddTriples(
ad_utility::SharedCancellationHandle cancellationHandle,
std::span<const IdTriple<0>> idTriples, bool shouldExist) {
ad_utility::HashMap<Permutation::Enum, std::vector<LocatedTriples::iterator>>
intermediateHandles;
Qup42 marked this conversation as resolved.
Show resolved Hide resolved
for (auto permutation : Permutation::ALL) {
auto& perm = index_.getImpl().getPermutation(permutation);
auto locatedTriples = LocatedTriple::locateTriplesInPermutation(
// TODO<qup42>: replace with the method for update block metadata once
// integration is done
idTriples, perm.metaData().blockData(), perm.keyOrder(), shouldExist,
cancellationHandle);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it true that the blockData() already lives inside the LocatedTriple, and the keyOrder should be a member of the LocatedTriple.
On the other hand I currently have forgotten, which state for more than one triple is stored in the LocatedTriplesat all...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I think I know what I want to say:

The LocatedTriples should know about the above members,
and then you immediately call a function locatedAndAdd on that object, which gives you back the handles and basically does the two steps that are performed here.

Copy link
Member Author

@Qup42 Qup42 Sep 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mostly don't understand what you want to say here.

  • getAugmenteMetadata is already present, but doesn't work yet because it is not initialized. This will be part of the integration.
  • Do I understand you correctly that you propose that every LocatedTriple should the Permutation to which it belongs?
  • It is unclear to me how that would work. LocatedTriplesPerBlock manages all LocatedTriples for a permutation. DeltaTriples does the management of LocatedTriplesPerBlock for all permutation and exposes an interface that is no longer concerned with individual permutations. For this it needs the handles for all the permutations of each triple. How exactly do you propose to move this into LocatedTriples?

cancellationHandle->throwIfCancelled();
intermediateHandles[permutation] =
getLocatedTriplesPerBlock(permutation).add(locatedTriples);
cancellationHandle->throwIfCancelled();
}
std::vector<DeltaTriples::LocatedTripleHandles> handles{idTriples.size()};
for (auto permutation : Permutation::ALL) {
for (size_t i = 0; i < idTriples.size(); i++) {
handles[i].forPermutation(permutation) =
intermediateHandles[permutation][i];
}
}
joka921 marked this conversation as resolved.
Show resolved Hide resolved
return handles;
}

// ____________________________________________________________________________
void DeltaTriples::eraseTripleInAllPermutations(
DeltaTriples::LocatedTripleHandles& handles) {
// Erase for all permutations.
for (auto permutation : Permutation::ALL) {
auto ltIter = handles.forPermutation(permutation);
getLocatedTriplesPerBlock(permutation).erase(ltIter->blockIndex_, ltIter);
}
}

// ____________________________________________________________________________
void DeltaTriples::insertTriples(
ad_utility::SharedCancellationHandle cancellationHandle,
std::vector<IdTriple<0>> triples) {
LOG(DEBUG) << "Inserting " << triples.size()
<< " triples (including idempotent triples)." << std::endl;
std::ranges::sort(triples);
// Unique moves all duplicate items to the end and returns iterators for that
// subrange.
auto [first, last] = std::ranges::unique(triples);
triples.erase(first, last);
std::erase_if(triples, [this](const IdTriple<0>& triple) {
return triplesInserted_.contains(triple);
});
std::ranges::for_each(triples, [this](const IdTriple<0>& triple) {
auto handle = triplesDeleted_.find(triple);
if (handle != triplesDeleted_.end()) {
eraseTripleInAllPermutations(handle->second);
triplesDeleted_.erase(triple);
}
});

std::vector<LocatedTripleHandles> handles =
locateAndAddTriples(std::move(cancellationHandle), triples, true);

AD_CORRECTNESS_CHECK(triples.size() == handles.size());
// TODO<qup42>: replace with std::views::zip in C++23
for (size_t i = 0; i < triples.size(); i++) {
triplesInserted_.insert({triples[i], handles[i]});
}
}

// ____________________________________________________________________________
void DeltaTriples::deleteTriples(
ad_utility::SharedCancellationHandle cancellationHandle,
std::vector<IdTriple<0>> triples) {
LOG(DEBUG) << "Deleting " << triples.size()
<< " triples (including idempotent triples)." << std::endl;
std::ranges::sort(triples);
auto [first, last] = std::ranges::unique(triples);
triples.erase(first, last);
std::erase_if(triples, [this](const IdTriple<0>& triple) {
return triplesDeleted_.contains(triple);
Qup42 marked this conversation as resolved.
Show resolved Hide resolved
});
std::ranges::for_each(triples, [this](const IdTriple<0>& triple) {
auto handle = triplesInserted_.find(triple);
if (handle != triplesInserted_.end()) {
eraseTripleInAllPermutations(handle->second);
triplesInserted_.erase(triple);
}
});

std::vector<LocatedTripleHandles> handles =
locateAndAddTriples(std::move(cancellationHandle), triples, false);

AD_CORRECTNESS_CHECK(triples.size() == handles.size());
// TODO<qup42>: replace with std::views::zip in C++23
for (size_t i = 0; i < triples.size(); i++) {
triplesDeleted_.insert({triples[i], handles[i]});
}
}

// ____________________________________________________________________________
const LocatedTriplesPerBlock& DeltaTriples::getLocatedTriplesPerBlock(
Permutation::Enum permutation) const {
switch (permutation) {
case Permutation::PSO:
return locatedTriplesPerBlockInPSO_;
case Permutation::POS:
return locatedTriplesPerBlockInPOS_;
case Permutation::SPO:
return locatedTriplesPerBlockInSPO_;
case Permutation::SOP:
return locatedTriplesPerBlockInSOP_;
case Permutation::OSP:
return locatedTriplesPerBlockInOSP_;
case Permutation::OPS:
return locatedTriplesPerBlockInOPS_;
default:
Qup42 marked this conversation as resolved.
Show resolved Hide resolved
AD_FAIL();

Check warning on line 166 in src/index/DeltaTriples.cpp

View check run for this annotation

Codecov / codecov/patch

src/index/DeltaTriples.cpp#L166

Added line #L166 was not covered by tests
}
}

// ____________________________________________________________________________
LocatedTriplesPerBlock& DeltaTriples::getLocatedTriplesPerBlock(
Permutation::Enum permutation) {
switch (permutation) {
case Permutation::PSO:
return locatedTriplesPerBlockInPSO_;
case Permutation::POS:
return locatedTriplesPerBlockInPOS_;
case Permutation::SPO:
return locatedTriplesPerBlockInSPO_;
case Permutation::SOP:
return locatedTriplesPerBlockInSOP_;
case Permutation::OSP:
return locatedTriplesPerBlockInOSP_;
case Permutation::OPS:
return locatedTriplesPerBlockInOPS_;
default:
AD_FAIL();

Check warning on line 187 in src/index/DeltaTriples.cpp

View check run for this annotation

Codecov / codecov/patch

src/index/DeltaTriples.cpp#L187

Added line #L187 was not covered by tests
Qup42 marked this conversation as resolved.
Show resolved Hide resolved
}
}
134 changes: 134 additions & 0 deletions src/index/DeltaTriples.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
// Copyright 2023 - 2024, University of Freiburg
// Chair of Algorithms and Data Structures.
// Authors:
// 2023 Hannah Bast <bast@cs.uni-freiburg.de>
// 2024 Julian Mundhahs <mundhahj@tf.uni-freiburg.de>

#pragma once

#include "engine/LocalVocab.h"
#include "global/IdTriple.h"
#include "index/Index.h"
#include "index/IndexBuilderTypes.h"
#include "index/LocatedTriples.h"
#include "parser/TurtleParser.h"
#include "util/HashSet.h"

// A class for maintaining triples that are inserted or deleted after index
// building, we call these delta triples. How it works in principle:
//
// 1. For each delta triple, find the block index in each permutation (see
// `LocatedTriple` in `index/LocatedTriples.h`).
//
// 2. For each permutation and each block, store a sorted list of the positions
// of the delta triples within that block (see `LocatedTriplesPerBlock` in
// `index/LocatedTriples.h`).
//
// 3. In the call of `PermutationImpl::scan`, use the respective lists to merge
// the relevant delta triples into the index scan result.
//
// NOTE: The delta triples currently do not go well together with CACHING. See
// the discussion at the end of this file.
class DeltaTriples {
private:
// The index to which these triples are added.
const Index& index_;

// The local vocabulary of the delta triples (they may have components,
// which are not contained in the vocabulary of the original index).
LocalVocab localVocab_;

// The positions of the delta triples in each of the six permutations.
LocatedTriplesPerBlock locatedTriplesPerBlockInPSO_;
LocatedTriplesPerBlock locatedTriplesPerBlockInPOS_;
LocatedTriplesPerBlock locatedTriplesPerBlockInSPO_;
LocatedTriplesPerBlock locatedTriplesPerBlockInSOP_;
LocatedTriplesPerBlock locatedTriplesPerBlockInOSP_;
LocatedTriplesPerBlock locatedTriplesPerBlockInOPS_;

FRIEND_TEST(DeltaTriplesTest, insertTriplesAndDeleteTriples);

// Each delta triple needs to know where it is stored in each of the six
// `LocatedTriplesPerBlock` above.
struct LocatedTripleHandles {
LocatedTriples::iterator forPSO_;
Qup42 marked this conversation as resolved.
Show resolved Hide resolved
LocatedTriples::iterator forPOS_;
LocatedTriples::iterator forSPO_;
LocatedTriples::iterator forSOP_;
LocatedTriples::iterator forOPS_;
LocatedTriples::iterator forOSP_;
Qup42 marked this conversation as resolved.
Show resolved Hide resolved

LocatedTriples::iterator& forPermutation(Permutation::Enum permutation);
};

// The sets of triples added to and subtracted from the original index. In
// particular, no triple can be in both of these sets.
Qup42 marked this conversation as resolved.
Show resolved Hide resolved
ad_utility::HashMap<IdTriple<0>, LocatedTripleHandles> triplesInserted_;
ad_utility::HashMap<IdTriple<0>, LocatedTripleHandles> triplesDeleted_;

public:
// Construct for given index.
explicit DeltaTriples(const Index& index) : index_(index) {}

// Get the common `LocalVocab` of the delta triples.
LocalVocab& localVocab() { return localVocab_; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a good idea to make this mutable getter public?
It lets us introduce very very very nasty bugs.

Copy link
Member Author

@Qup42 Qup42 Aug 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made it private for now.
Mutable LocalVocabs are used in LocatedTriples and the execution of normal queries (though these are currently temporary). These two need to be in sync for the retrieval of IDs while querying. The temporary IDs that a generated while querying (e.g. Service) can be discarded. Given the constraints I decided on a single global LocalVocab. This is a point that is still open for the integration PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes,
I think that issues like "how do we clean up the local vocab, when the inserted entries are not needed anymore"
Can be postponed to the future, as it is probably related with the question "how do we deal with concurrent updates" (in particular updates that run concurrently with queries.).

const LocalVocab& localVocab() const { return localVocab_; }

Check warning on line 75 in src/index/DeltaTriples.h

View check run for this annotation

Codecov / codecov/patch

src/index/DeltaTriples.h#L75

Added line #L75 was not covered by tests

// Clear `_triplesAdded` and `_triplesSubtracted` and all associated data
Qup42 marked this conversation as resolved.
Show resolved Hide resolved
// structures.
void clear();

// The number of delta triples added and subtracted.
size_t numInserted() const { return triplesInserted_.size(); }
size_t numDeleted() const { return triplesDeleted_.size(); }

// Insert triples.
void insertTriples(ad_utility::SharedCancellationHandle cancellationHandle,
std::vector<IdTriple<0>> triples);

// Delete triples.
void deleteTriples(ad_utility::SharedCancellationHandle cancellationHandle,
std::vector<IdTriple<0>> triples);

// Get `TripleWithPosition` objects for given permutation.
LocatedTriplesPerBlock& getLocatedTriplesPerBlock(
Permutation::Enum permutation);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, what do we need the mutable access for (other than testing, where a friend would be a better option, and often not even that is needed).

Copy link
Member Author

@Qup42 Qup42 Sep 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. The mutable accessor is only used internally. To avoid switching between public and private methods I have inlined the 3 calls to it.

const LocatedTriplesPerBlock& getLocatedTriplesPerBlock(
Permutation::Enum permutation) const;

private:
// Find the position of the given triple in the given permutation and add it
// to each of the six `LocatedTriplesPerBlock` maps (one per permutation).
// Return the iterators of where it was added (so that we can easily delete it
// again from these maps later).
Qup42 marked this conversation as resolved.
Show resolved Hide resolved
std::vector<LocatedTripleHandles> locateAndAddTriples(
ad_utility::SharedCancellationHandle cancellationHandle,
std::span<const IdTriple<0>> idTriples, bool shouldExist);

// Erase `LocatedTriple` object from each `LocatedTriplesPerBlock` list. The
// argument are iterators for each list, as returned by the method
// `locateTripleInAllPermutations` above.
//
// NOTE: The iterators are invalid afterward. That is OK, as long as we also
// delete the respective entry in `triplesInserted_` or `triplesDeleted_`,
// which stores these iterators.
void eraseTripleInAllPermutations(LocatedTripleHandles& handles);
};

// DELTA TRIPLES AND THE CACHE
//
// For now, our approach only works when the results of index scans are not
// cached (unless there are no relevant delta triples for a particular scan).
// There are two ways how this can play out in the future:
//
// Either we generally do not cache the results of index scans anymore. This
// would have various advantages, in particular, joining with something like
// `rdf:type` would then be possible without storing the whole relation in
// RAM. However, we need a faster decompression then and maybe a smaller block
// size (currently 8 MB).
//
// Or we add the delta triples when iterating over the cached (uncompressed)
// result from the index scan. In that case, we would need to (in Step 1 above)
// store and maintain the positions in those uncompressed index scans. However,
// this would only work for the results of index scans. For the results of more
// complex subqueries, it's hard to figure out which delta triples are relevant.
Qup42 marked this conversation as resolved.
Show resolved Hide resolved
6 changes: 5 additions & 1 deletion src/index/Permutation.h
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,8 @@ class Permutation {
static constexpr auto SOP = Enum::SOP;
static constexpr auto OPS = Enum::OPS;
static constexpr auto OSP = Enum::OSP;
static constexpr auto ALL = {Enum::PSO, Enum::POS, Enum::SPO,
Enum::SOP, Enum::OPS, Enum::OSP};

using MetaData = IndexMetaDataMmapView;
using Allocator = ad_utility::AllocatorWithLimit<Id>;
Expand Down Expand Up @@ -128,6 +130,9 @@ class Permutation {
// _______________________________________________________
const bool& isLoaded() const { return isLoaded_; }

// _______________________________________________________
const MetaData& metaData() const { return meta_; }

joka921 marked this conversation as resolved.
Show resolved Hide resolved
private:
// for Log output, e.g. "POS"
std::string readableName_;
Expand All @@ -137,7 +142,6 @@ class Permutation {
// sorted, for example {1, 0, 2} for PSO.
array<size_t, 3> keyOrder_;

const MetaData& metaData() const { return meta_; }
MetaData meta_;

// This member is `optional` because we initialize it in a deferred way in the
Expand Down
2 changes: 2 additions & 0 deletions test/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -118,6 +118,8 @@ addLinkAndDiscoverTest(LocatedTriplesTest index)

addLinkAndDiscoverTestSerial(IdTripleTest index)

addLinkAndDiscoverTestSerial(DeltaTriplesTest index)

addLinkAndDiscoverTest(EngineTest engine)

addLinkAndDiscoverTest(JoinTest engine)
Expand Down
Loading
Loading