Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: Change tablet size calculation to not depend on the right key. #5656

Merged
merged 2 commits into from
Jun 16, 2020

Conversation

martinmr
Copy link
Contributor

@martinmr martinmr commented Jun 15, 2020

In badger, the right key might be a badger specific key that Dgraph
cannot understand. To deal with these keys, a table is included in
the size calculation if the next table starts with the same key.

Related to DGRAPH-1358 and #5408.


This change is Reviewable

In badger, the right key might be a badger specific key that Dgraph
cannot understand. To deal with these keys, a table is included in
the size calculation if the next table starts with the same key.

Related to DGRAPH-1358 and #5408.
@martinmr martinmr requested a review from a team June 15, 2020 23:32
Copy link
Contributor

@parasssh parasssh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 files reviewed, 3 unresolved discussions (waiting on @manishrjain, @martinmr, and @vvbalaji-dgraph)

a discussion (no related file):
See if we can simple UT.



worker/draft.go, line 1278 at r1 (raw file):

			// same predicate.
			// We could later specifically iterate over these tables to get their estimated sizes.
			previousLeft = left.Attr

Lets simplify. Something like this:

if (need to update) {
          updateSize()
}
prevLeft =   left.Attr
prevSize = int64(tinfo.EstimatedSz)

worker/draft.go, line 1289 at r1 (raw file):

	}
	// The last table has not been counted. Assign it to the predicate at the left of the table.
	updateSize(previousLeft, previousSize)

This will always be on over-estimate, which I think is probably ok.

Copy link
Contributor

@parasssh parasssh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Minor comment on simplifying it.

Reviewable status: 0 of 1 files reviewed, 3 unresolved discussions (waiting on @manishrjain, @martinmr, and @vvbalaji-dgraph)

@jarifibrahim
Copy link
Contributor

jarifibrahim commented Jun 16, 2020

@parasssh @martinmr The tableInfo.Left has a higher chance of being an internal key then the Right key. All the level 0 tables have the head as the leftmost key. So we would have to skip these tables anyway.

For tables other than level 0, they wouldn't usually have an internal key as the right-most key. So what do we gain by this change? What is the benefit of checking for the left key of the next table instead of the right key of the current table?

An important thing to remember is that all the versions of a single key will be stored in a single file. So if you have 100,000 versions of foo. They all will be stored in the same sst file.

@parasssh
Copy link
Contributor

parasssh commented Jun 16, 2020

@parasssh @martinmr The tableInfo.Left has a higher chance of being an internal key then the Right key. All the level 0 tables have the head as the leftmost key. So we would have to skip these tables anyway.

The thought process was that Left keys are always valid (not internal) keys. Hence, we only iterate on those. But it seems that is not the case either.

So, I guess, in that case, TableInfo's Left and Right should point to 1st and last valid keys in the table, not internal keys to make Table Size calculation work.

Copy link
Contributor Author

@martinmr martinmr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 files reviewed, 3 unresolved discussions (waiting on @manishrjain, @parasssh, and @vvbalaji-dgraph)

a discussion (no related file):

Previously, parasssh wrote…

See if we can simple UT.

Hard to do a manual test because we'd have to wait five minutes. I tested this by live loading the 1 million dataset and waiting for table sizes to be calculated.



worker/draft.go, line 1278 at r1 (raw file):

Previously, parasssh wrote…

Lets simplify. Something like this:

if (need to update) {
          updateSize()
}
prevLeft =   left.Attr
prevSize = int64(tinfo.EstimatedSz)

Done.


worker/draft.go, line 1289 at r1 (raw file):

Previously, parasssh wrote…

This will always be on over-estimate, which I think is probably ok.

Yes, I thought it was ok to do.

@martinmr martinmr changed the title Change tablet size calculation to not depend on the right key. Fix: Change tablet size calculation to not depend on the right key. Jun 16, 2020
@martinmr martinmr merged commit d91018f into master Jun 16, 2020
@martinmr martinmr deleted the martinmr/right-key-table branch June 16, 2020 21:29
martinmr added a commit that referenced this pull request Jun 16, 2020
…5656)

In badger, the right key might be a badger specific key that Dgraph
cannot understand. To deal with these keys, a table is included in
the size calculation if the next table starts with the same key.

Related to DGRAPH-1358 and #5408.
martinmr added a commit that referenced this pull request Jun 18, 2020
…5656)

In badger, the right key might be a badger specific key that Dgraph
cannot understand. To deal with these keys, a table is included in
the size calculation if the next table starts with the same key.

Related to DGRAPH-1358 and #5408.
dna2github pushed a commit to dna2fork/dgraph that referenced this pull request Jul 18, 2020
…graph-io#5656)

In badger, the right key might be a badger specific key that Dgraph
cannot understand. To deal with these keys, a table is included in
the size calculation if the next table starts with the same key.

Related to DGRAPH-1358 and dgraph-io#5408.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants