-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proof of concept for SPARQL UPDATE #916
base: master
Are you sure you want to change the base?
Conversation
1. Add support for URL parameters "insert=..." and "delete=..." for inserting or deleting a single triple. 2. Stub of new class `DeltaTriples` that maintains the set of inserted and deleted triples. 3. First working implementation of a method `findTripleInPermutation` that for a given triple and a given permutation, finds the matching block in that permutation and the right position in that block.
There is now a test that checks for all existing triples whether the found location is correct (by checking `id2` and `id3` at the found position in the found block, note that the block does not have an explicity `id1` for a given position). The `findTripleInPermutation` method is still (very) inefficient in that it goes through the complete relation metadata in order to find the sequence of `id1`s relevant for a block. This will be fixed in the next commit. Note: the previous commit lacked the new files `DeltaTriples.h`, `DeltaTriples.cpp`, and `DeltaTriplesTest.cpp`.
1. Refactored `DeltaTriple::locateTripleInAllPermutations` (the central method of this class). 2. Wrote a test that checks all triple that are contained in the index as well as a slightly modified version of each triple that is not in the index. The test checks that the triple has been located at the exact right position in all permutations. (This is harder than it seems because a lot of things can go wrong + we do not have the relation `Id`s for the blocks explicitly, but only implicitly via the relation metadata.) 3. The method `locateTripleInAllPermutations` now inserts the results into proper data structures that can then be used conveniently in an index scan (writing that code is the next step, but it should be relatively straightforward now).
1. The internal data structure for each permutation now stores a `std::set` (which is ordered) for the triples located at a certain position of a certain block. 2. For each triple that is inserted or deleted, remember the iterator into the corresponding `std::set` for each permutation. This is important for undoing an insertion or deletion (for example, when re-inserting a previous deleted triple). 3. The unit tests now test insertion and deletion directly. The following cases are covered now: deletion of triples that exist in the index, insertion of triples that did not exist in the index, deletion of previously inserted triples, and re-insertion of previously deleted triples. 4. Move the visualization code (that show the contents of a particular block where a triple has been located) out of the implementation of the `DeltaTriples` class and into `DeltaTriplesTest`.
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## master #916 +/- ##
==========================================
- Coverage 74.07% 72.11% -1.97%
==========================================
Files 254 247 -7
Lines 23997 24389 +392
Branches 3021 3156 +135
==========================================
- Hits 17776 17587 -189
- Misses 5010 5449 +439
- Partials 1211 1353 +142
☔ View full report in Codecov by Sentry. |
It complained about an assignment in the condition of an `if`.
1. The block size used to be `1 << 23` (over 8M), which is too large, since we always need to decompress at least one whole block, even when reading only few triples. It's now 100'000, which still has a small relatively small overall space consumption. 2. Add member `_col2LastId` to block data because we need it for the delta triples (#916).
1. Each permutation now has information about the delta triples per block and where exactly they are located in that permutation 2. As a proof of concept, the LOG already shows the number of delta triples found in the blocks relevant for the scan.
1. Improved the names. In particular, `TripleWithPosition` is now called `LocatedTriple` and analogously for the related classes. 2. Put `LocatedTriple` and associated classed in own file `LocatedTriple.h`. Also move a function there that used to be a helper lambda in one of `DeltaTriples` methods. TODO: The `locatedTripleInPermutation` method should also be moved to `LocatedTriple.h` and there should be a `test/LocatedTripleTest` with all the associated tests.
1. The code for locating triples for an individual permutation is now no longer in the (already too big) class `DeltaTriples` but in separate classes in files `LocatedTriples.{h,cpp}`. The corresponding tests are also in a new file `test/LocatedTriplesTest.cpp` now. Used the opportunity to improve the code in several respects. 2. First attempt at writing a function that or a given block merges the relevant delta triples into it. Tried to do it in-place without using an extra array, until I realized that that leads to a very hard algorithmic problem, which most likely can't be solved practically efficiently. At least it's clear now that the best approach is to first decompress the block into a temporary array and the merge that temporary array and the relevant delta triples in the pre-allocated portion of the result `IdTable`. That's what I will implement next and it shouldn't be hard. 3. When testing the merging, it helps to output the columns of an `IdTable`. For that, values like `VocabIndex:15` are rather unpractical to read, so I abbreviated these to use a one-letter prefix, like in `V:15`.
There is now a method `LocatedTriplesPerBlock::mergeTriples` that merges the delta triple into a given (possibly partial) block. There are unit tests for the following cases: full block with unrestricted `id1` and `id2`, full block with restricted `id1`, patial block with restricetd `id1`, partial block with restricted `id1` and `id2`, and the latter with only a single column. Removed the previous complicated version that tried to do it in place.
Implemented it and wrote some unit tests. TODO: If the first block of the scan is incomplete, delta triples are currently ignored (with a warning). There is no principle problem to add this, but this first needs some refactoring of the original code to avoid code duplication.
1. The block size used to be 1 << 23 (over 8M), which is too large, since we always need to decompress at least one whole block, even when reading only few triples. It's now 500'000, which still has a relatively small overall space consumption. 2. Add member _col2LastId to block data because we will need it for #916 and it's nice if our live indexes already have this information so that we can play around with this PR without having to rebuild indexes. 3. Renamed the data members in `CompressedRelation.h` such that they have `trailingUnderscores_`. 4. Unrelated fix in IndexTestHelpers.h: The test TTL file was not deleted after the test, now it is.
The variant of `CompressedRelation::scan` with two `Id`s fixed now also considers delta triples. This was significantly more complicated than for the variant with only one `Id` fixed and required quite a bit of refactoring. TODO: For the first incomplete block when only a single `Id` is fixed, delta triples are still not considered. That should be easy to add though.
There was still one case missing: the possibly incomplete bloc at the beginning when only a single `Id` is fixed. Now delta tiples ae also considered for these blocks. TOOD: The unit test works for a permutation with a single relation, but there is still a problem when there are multiple relations.
I am not convinced though that it was a bug.
This reverts commit 3afe571.
SonarCloud Quality Gate failed. 1 Bug No Coverage information |
[incomplete, intermediate commit so that I can switch branches]
One test in `LocatedTriplesTest` still fails because writing the files for a PSO permutation and then reading from it no longer works as it did. I hope that Johannes or Julian can help me.
This is the first part of a series of PRs split of from the large proof-of-concept PR #916, which realizes SPARQL 1.1 Update
SonarCloud Quality Gate failed. 1 Bug No Coverage information |
The code is based on the PRs ad-freiburg#916 and ad-freiburg#1000
Hi Hannah, Thank you for the updates on the proof of concept for SPARQL UPDATE. For the OpenCitations project, the update functionality has become a priority as we are looking to migrate our OpenCitations Meta data to Qlever. We would like to inquire about the current progress on this feature. Specifically, we are interested in understanding if it is possible to add triples via SPARQL using the insert and delete commands mentioned. From the PR description, it is not clear if these operations can be performed directly via SPARQL or if they are limited to inserting or deleting one triple at a time through the API. Could you please provide more details on this? Any updates or clarifications on the state of the SPARQL UPDATE feature would be greatly appreciated. |
This is not yet meant for production but as a proof of concept. At the time of the creation of this PR, there are commands
insert
anddelete
for inserting or deleting a single triple via the API. The triple is located in each of the permutations, and the location is sorted in an internal data structure (of a new classDeltaTriples
). There are already extensive unit tests for this functionality, and it seems to work correctly.There is no code yet that actually takes the inserted or deleted triples into account when processing queries. But that is actually relatively little work (when reading a block from a permutation, just augment the result by the delta triples stored for that block). The hard part, it seems, is the location.