Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix NO_LIMIT handling for key-value stores #3341

Merged
merged 1 commit into from
Feb 17, 2023

Conversation

mad
Copy link
Contributor

@mad mad commented Nov 28, 2022

Without fix could not reindex store with more than Integer.MAX_VALUE entries because KeySelector.limit is integer

Signed-off-by: Pavel Ershov owner.mad.epa@gmail.com

Without fix could not reindex store with more than Integer.MAX_VALUE entries because `KeySelector.limit` is integer

Signed-off-by: Pavel Ershov <owner.mad.epa@gmail.com>
@janusgraph-bot janusgraph-bot added the cla: external Externally-managed CLA label Nov 28, 2022
Copy link
Member

@porunov porunov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you @mad !

@FlorianHockmann FlorianHockmann added this to the Release v1.0.0 milestone Dec 7, 2022
@FlorianHockmann
Copy link
Member

Without fix could not reindex store with more than Integer.MAX_VALUE entries because KeySelector.limit is integer

If that is the problem, can't we then simply switch from integer to long?

Also, is it possible (with reasonable effort) to write a test that shows this problem?

@porunov
Copy link
Member

porunov commented Feb 7, 2023

If that is the problem, can't we then simply switch from integer to long?

I was trying to quickly switch from int to long and it seems to be a little bit more complicated because too many JanusGraph interfaces and classes relay on int specifically. In some places long doesn't even make sense (in case when we want to compare limit to collection size or resize some array because in Java arrays and standard collections may have up to int elements and no more). I would say switching to long is possible and maybe we should do that but we need to revise all places where it's used and there is a chance it will introduce a breaking change in some places. That said, I didn't spend enough time investigating the issue.

Also, is it possible (with reasonable effort) to write a test that shows this problem?

I personally don't know how to write a meaningful test to check reindexing of 2^32 elements. I guess could be possible to write unit tests for classes which are changed.
I can't say I'm sure that in all changed places the check is necessary because I didn't investigate this issue enough, but it looks to me that those checks are either fixing the problem or they are simply redundant (but harmless) checks.
Thus, I am OK with merging this PR without tests if @mad confirms it fixes the problem with re-indexing of > 2^32 elements.
I personally don't have any specific preference on how to resolve the issue. Thus, take my thoughts above with a graint of salt.

porunov added a commit to porunov/janusgraph that referenced this pull request Feb 17, 2023
Related to JanusGraph#3341 ( JanusGraph#3341 )

Signed-off-by: Oleksandr Porunov <alexandr.porunov@gmail.com>
porunov added a commit to porunov/janusgraph that referenced this pull request Feb 17, 2023
Related to JanusGraph#3341 ( JanusGraph#3341 )

Signed-off-by: Oleksandr Porunov <alexandr.porunov@gmail.com>
@porunov
Copy link
Member

porunov commented Feb 17, 2023

@FlorianHockmann @mad

I believe this PR fixes the issue. I didn't come up with a good test to verify the issue, but I was able to make a simple test which might be useful to understand the issue.
I tested the below logic in master branch (failed) and in this branch (passed).
The testing logic was the next:

  1. In every file where @mad introduced additional check on hasLimit() I changed the assuming starting point to Integer.MAX_VALUE - 1000.
  2. Made a test which adds 2000 vertices into the graph in 4 transactions (500 each).
  3. Create an index which covers the added vertices.
  4. Reindex data using the newly added index.
  5. Check that count from index and full-scan count return the same amount of elements.
  6. Check that count from index return 2000 elements.

The last step fails in master and returns only 1000 elements.

  • Index aggregation count query returns only 1000 elements which verifies that REINDEX operation didn't process all vertices.
  • Full-scan query returns 1000 elements which verifies that no full-scan queries can process more than Integer.MAX_VALUE elements.

Test which fails against latest master branch is available in this branch (commit).
Test which passes against this PR branch is available in this branch (commit)

Test is called: testReindexOfDataSetOutOfIntLimitSize().

Notice that we can't include that test into this PR because that test assumes we modified core code to have starting count as Integer.MAX_VALUE - 1000 instead of 0.

I believe we should change all limits to long, but as I noticed here it has some complexity.
Thus, I propose to merge this PR and let the fix lend into v0.6 branch.

@FlorianHockmann
Copy link
Member

Thanks for your detailed investigation and confirming that the changes here fix the problem with a test.

I really didn't want to cause so much trouble with my question whether we could add a test here 😅

But we can definitely merge this from my side.

@porunov
Copy link
Member

porunov commented Feb 17, 2023

Thanks for your detailed investigation and confirming that the changes here fix the problem with a test.

I really didn't want to cause so much trouble with my question whether we could add a test here sweat_smile

But we can definitely merge this from my side.

Your comment about a test case and long instead of int definitely make sense. I also created an issue to transition to long limit #3603

@porunov porunov merged commit 9e26fd3 into JanusGraph:master Feb 17, 2023
@github-actions
Copy link

💚 All backports created successfully

Status Branch Result
v0.6

Note: Successful backport PRs will be merged automatically after passing CI.

Questions ?

Please refer to the Backport tool documentation and see the Github Action logs for details

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants