Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ReadPostingList skipping over valid keys #4905

Closed
martinmr opened this issue Mar 10, 2020 · 0 comments · Fixed by #4908
Closed

ReadPostingList skipping over valid keys #4905

martinmr opened this issue Mar 10, 2020 · 0 comments · Fixed by #4908
Assignees
Labels
kind/bug Something is broken. priority/P0 Critical issue that requires immediate attention. status/accepted We accept to investigate/work on it.

Comments

@martinmr
Copy link
Contributor

What version of Dgraph are you using?

master

Have you tried reproducing the issue with the latest release?

yes

Steps to reproduce the issue (command/config used to run Dgraph).

In ReadPostingList, there's logic to skip over the parts of a multi-list part shown below.

			// If this list is a multi-part list, advance past the keys holding the parts.
			if len(l.plist.GetSplits()) > 0 {
				lastKey, err := x.GetSplitKey(key, math.MaxUint64)
				if err != nil {
					return nil, errors.Wrapf(err,
						"while advancing past the end of multi-part list with key [%v]", key)
				}
				it.Seek(lastKey)
			}

This logic assumes there are no other lists in between. However, I don't think that's true as there's a flipped bit in the key to signal that the key is a multi-part list but that bit is not the last in the key. This means there are other valid keys in the range that are skipped, such as count and term index keys.

This bit is not the last in the key because of term indexes. We don't encode the length of the term in the key so it's not possible to put the bit that signals this key corresponds to a multi-part list at the end. The best (long term) solution is to change the format of the keys so that this is true.

The short-term solution would be to remove this code and ensure that multi-part list keys are skipped in the places where the stream framework is called (rollup, backups, etc).

Expected behaviour and actual result.

Some keys might not be rolled up or included in backups because they are skipped. All keys should be considered for rollup. This bug does not lead to data losses for rollups but could lead to performance issues since the rollup is not being done for certain keys.

@martinmr martinmr self-assigned this Mar 10, 2020
@martinmr martinmr added kind/bug Something is broken. priority/P0 Critical issue that requires immediate attention. status/accepted We accept to investigate/work on it. labels Mar 10, 2020
martinmr added a commit that referenced this issue Mar 16, 2020
Currently ReadPostingList is skipping the iterator past the last possible key for
a multi-part list. The assumption was that the only keys in this range are the
main keys and the keys for each part. This is the case when a predicate has a
term and count index.

This PR changes the format of the split keys so that the first byte is different for
the keys holding parts of a multi-part list. This allows the stream framework to
skip over those keys during a rollup, eliminating the need for skipping keys.

This PR, also changes ReadPostingList to return a nil list when an individual part
is requested and the List struct so that all the operations on a nil list are a no-op
to safely handle cases where the list is being tried to accessed from the key of one
of its parts.

Fixes #4905. See that issue for more details.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something is broken. priority/P0 Critical issue that requires immediate attention. status/accepted We accept to investigate/work on it.
Development

Successfully merging a pull request may close this issue.

1 participant