something wrong with tablet size calculation #5408

willem520 · 2020-05-09T07:30:25Z

What version of Dgraph are you using?

Dgraph v20.03.1

Have you tried reproducing the issue with the latest release?

yes

What is the hardware spec (RAM, OS)?

centos(126G)

Steps to reproduce the issue (command/config used to run Dgraph).

I noticed the tablet size is more than disk capacity
my machine disk capacity is

the p directory size is

when I used /state endpoint, I got the result

from the ratel, I got the result

md5id and gid size is about 6.1TB,but my disk capacity is 2.9T

Expected behaviour and actual result.

tablet size can calculate correctly.

Related to https://discuss.dgraph.io/t/ratel-predicate-capactiy

martinmr · 2020-06-11T23:38:20Z

I haven't been able to reproduce this exact issue but it looks like something else is wrong. When calculating the sizes, the function skips all the tables with the following errors.

alpha1    | I0611 23:25:56.138946      14 draft.go:1245] Calculating tablet sizes. Found 4 tables
alpha1    | I0611 23:25:56.139087      14 draft.go:1254] Unable to parse key: Invalid size 25185 for key [33 98 97 100 103 101 114 33 104 101 97 100 255 255 255 255 255 255 255 254]
alpha1    | I0611 23:25:56.139145      14 draft.go:1254] Unable to parse key: Invalid size 25185 for key [33 98 97 100 103 101 114 33 104 101 97 100 255 255 255 255 255 255 255 254]
alpha1    | I0611 23:25:56.139560      14 draft.go:1254] Unable to parse key: Invalid size 25185 for key [33 98 97 100 103 101 114 33 104 101 97 100 255 255 255 255 255 255 255 254]
alpha1    | I0611 23:25:56.139703      14 draft.go:1254] Unable to parse key: Invalid size 25185 for key [33 98 97 100 103 101 114 33 104 101 97 100 255 255 255 255 255 255 255 254]
alpha1    | I0611 23:25:56.139773      14 draft.go:1276] No tablets found.

The error happens when trying to read the biggest key of the table and to parse it into a Dgraph key. Note that the smallest (left) key can be read without issue. Also, the value is the same in all four tables. Maybe there's a special key at the end of a badger table?

I don't think there's an error with Dgraph itself because my cluster is working fine.

@jarifibrahim Do you have any insight into why the right keys of the tables might be different than what Dgraph expects?

jarifibrahim · 2020-06-12T14:40:07Z

@martinmr 33 98 97 100 103 101 114 33 104 101 97 is the !badger!head! key. The table contains keys inserted by dgraph but it also contains the internal keys inserted by badger. The biggest could be an internal badger key. See #5026 also.

. Also, the value is the same in all four tables. Maybe there's a special key at the end of a badger table?

Each level 0 table has one !badger!head! key.

martinmr · 2020-06-12T21:06:17Z

@jarifibrahim Ok. So to deal with this should I iterate through the table backwards until I find a valid key? Can I do something like dgraph-io/badger#1309 but from the dgraph side?

jarifibrahim · 2020-06-13T05:55:37Z

@martinmr, the last time we spoke to @manishrjain he suggested that it's okay to skip some tables. @parasssh would also remember this discussion.

So to deal with this should I iterate through the table backwards until I find a valid key? Can I do something like dgraph-io/badger#1309 but from the dgraph side?

The tables are not accessible out of badger. To perform a reverse iteration you would need access to the table and table iterator. The tables are not exposed. The db.Tables(..) call returns TableInfo, not the actual tables. We can expose the tables from badger and then dgraph can iterate over them (however it needs to).

parasssh · 2020-06-13T16:40:17Z

Correct. The tablet size is really just a rough estimate. Unless the entire table consists of the keys from the same predicate, dgraph will skip it in the tablet size calculation.

Having said that, I think we should have the TableInfo.Right point to the rightmost valid key instead of badger internal key so the error is not seen on dgraph. After all, the field Right is exported and so applications may access it presuming it to be its valid key (and not internal badger key).

Alternatively or Additionally, on dgraph side, we can make our tablet size calculation only rely on the Left field of each TableInfo entry. As long as two consecutive Left keys have same predicate, we include it in the calculation.

martinmr · 2020-06-15T23:24:24Z

@jarifibrahim I implemented what @parasssh suggested above. When I load the 1 million dataset I get a total size of 3.4GB. However, the size of the p directory (in a cluster with only one alpha running for simplicity) is 210MB.

One thing I don't know is whether the estimated size knows how to deal with compaction. Is the size the size of the uncompressed or compressed data? Maybe that could explain the difference I am seeing.

Otherwise, I think there's something wrong with the values EstimatedSz is reporting. The logic in the dgraph side is fairly simple and I haven't seen any other issue than the one mentioned above (which in any case is under-reporting the numbers so it doesn't explain the situation the user is seeing)..

In badger, the right key might be a badger specific key that Dgraph cannot understand. To deal with these keys, a table is included in the size calculation if the next table starts with the same key. Related to DGRAPH-1358 and #5408.

jarifibrahim · 2020-06-16T17:26:34Z

When I load the 1 million dataset I get a total size of 3.4GB. However, the size of the p directory (in a cluster with only one alpha running for simplicity) is 210MB.

One thing I don't know is whether the estimated size knows how to deal with compaction. Is the size the size of the uncompressed or compressed data? Maybe that could explain the difference I am seeing.

@martinmr How did you test this? Do you have steps that I can follow? This could be a badger bug, maybe some issue with how we do estimates in badger. The size is the estimated size of the uncompressed data but compression cannot make such a huge difference. This is definitely a bug. Let me how you tested it and I can verify it in badger.

martinmr · 2020-06-16T17:29:43Z

Use this branch: Fix: Change tablet size calculation to not depend on the right key. #5656
Change the tablet size calculation to happen once every minute instead of five.
Live load the 1 million dataset.
Wait for the tablet sizes to be calculated.

For simplicity, I used a cluster with 1 alpha and 1 zero.

EDIT: master now contains all the changes you need.

…5656) In badger, the right key might be a badger specific key that Dgraph cannot understand. To deal with these keys, a table is included in the size calculation if the next table starts with the same key. Related to DGRAPH-1358 and #5408.

jarifibrahim · 2020-06-18T09:14:44Z

@martinmr can you look at the badger code and figure out what's wrong? The calculations are done here https://github.com/dgraph-io/badger/blob/dd332b04e6e7fe06e4f213e16025128b1989c491/table/builder.go#L228

…5656) In badger, the right key might be a badger specific key that Dgraph cannot understand. To deal with these keys, a table is included in the size calculation if the next table starts with the same key. Related to DGRAPH-1358 and #5408.

…5684) In badger, the right key might be a badger specific key that Dgraph cannot understand. To deal with these keys, a table is included in the size calculation if the next table starts with the same key. Related to DGRAPH-1358 and #5408.

…5665) In badger, the right key might be a badger specific key that Dgraph cannot understand. To deal with these keys, a table is included in the size calculation if the next table starts with the same key. Related to DGRAPH-1358 and #5408.

…graph-io#5656) In badger, the right key might be a badger specific key that Dgraph cannot understand. To deal with these keys, a table is included in the size calculation if the next table starts with the same key. Related to DGRAPH-1358 and dgraph-io#5408.

minhaj-shakeel · 2020-07-20T19:49:17Z

Github issues have been deprecated.
This issue has been moved to discuss. You can follow the conversation there and also subscribe to updates by changing your notification preferences.

jarifibrahim added area/operations Related to operational aspects of the DB, including signals, flags, env vars, etc. kind/bug Something is broken. status/accepted We accept to investigate/work on it. labels May 9, 2020

jarifibrahim mentioned this issue May 9, 2020

zero /state does not report predicates size #5215

Closed

martinmr mentioned this issue Jun 15, 2020

Fix: Change tablet size calculation to not depend on the right key. #5656

Merged

martinmr mentioned this issue Jun 16, 2020

(release/v20.03) Fix: Change tablet size calculation to not depend on the right key #5665

Merged

martinmr mentioned this issue Jun 18, 2020

release/v20.07: Fix: Change tablet size calculation to not depend on the right key #5684

Merged

minhaj-shakeel closed this as completed Jul 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

something wrong with tablet size calculation #5408

something wrong with tablet size calculation #5408

willem520 commented May 9, 2020 •

edited by jarifibrahim

Loading

martinmr commented Jun 11, 2020 •

edited

Loading

jarifibrahim commented Jun 12, 2020 •

edited

Loading

martinmr commented Jun 12, 2020

jarifibrahim commented Jun 13, 2020

parasssh commented Jun 13, 2020 •

edited

Loading

martinmr commented Jun 15, 2020

jarifibrahim commented Jun 16, 2020

martinmr commented Jun 16, 2020 •

edited

Loading

jarifibrahim commented Jun 18, 2020

minhaj-shakeel commented Jul 20, 2020

something wrong with tablet size calculation #5408

something wrong with tablet size calculation #5408

Comments

willem520 commented May 9, 2020 • edited by jarifibrahim Loading

What version of Dgraph are you using?

Have you tried reproducing the issue with the latest release?

What is the hardware spec (RAM, OS)?

Steps to reproduce the issue (command/config used to run Dgraph).

Expected behaviour and actual result.

martinmr commented Jun 11, 2020 • edited Loading

jarifibrahim commented Jun 12, 2020 • edited Loading

martinmr commented Jun 12, 2020

jarifibrahim commented Jun 13, 2020

parasssh commented Jun 13, 2020 • edited Loading

martinmr commented Jun 15, 2020

jarifibrahim commented Jun 16, 2020

martinmr commented Jun 16, 2020 • edited Loading

jarifibrahim commented Jun 18, 2020

minhaj-shakeel commented Jul 20, 2020

willem520 commented May 9, 2020 •

edited by jarifibrahim

Loading

martinmr commented Jun 11, 2020 •

edited

Loading

jarifibrahim commented Jun 12, 2020 •

edited

Loading

parasssh commented Jun 13, 2020 •

edited

Loading

martinmr commented Jun 16, 2020 •

edited

Loading