Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash when loading data from badger in initStorage . #5573

Closed
JimWen opened this issue Jun 4, 2020 · 8 comments · Fixed by #5939
Closed

Crash when loading data from badger in initStorage . #5573

JimWen opened this issue Jun 4, 2020 · 8 comments · Fixed by #5939
Assignees
Labels
area/crash Dgraph issues that cause an operation to fail, or the whole server to crash. status/accepted We accept to investigate/work on it.

Comments

@JimWen
Copy link

JimWen commented Jun 4, 2020

What version of Dgraph are you using?

Dgraph version : v20.03.3
Dgraph SHA-256 : 08424035910be6b6720570427948bab8352a0b5a6d59a0d20c3ec5ed29533121
Commit SHA-1 : fa3c191
Commit timestamp : 2020-06-02 16:47:25 -0700
Branch : HEAD
Go version : go1.14.1

Have you tried reproducing the issue with the latest release?

yes

What is the hardware spec (RAM, OS)?

128G mem & 1.8T SSD

Linux version 3.10.0-1062.9.1.el7.x86_64 (mockbuild@kbuilder.bsys.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-39) (GCC) ) #1 SMP Fri Dec 6 15:49:49 UTC 2019

Steps to reproduce the issue (command/config used to run Dgraph).

Import large dataset with liveload and then restart.

Expected behaviour and actual result.

Log is as followings:

I0604 11:27:09.829595  113620 log.go:34] 1163 tables out of 5337 opened in 9.005s
I0604 11:27:12.836907  113620 log.go:34] 1446 tables out of 5337 opened in 12.012s
I0604 11:27:15.847727  113620 log.go:34] 1657 tables out of 5337 opened in 15.023s
I0604 11:27:18.833167  113620 log.go:34] 1929 tables out of 5337 opened in 18.009s
I0604 11:27:21.827819  113620 log.go:34] 2290 tables out of 5337 opened in 21.003s
I0604 11:27:24.834283  113620 log.go:34] 2585 tables out of 5337 opened in 24.01s
I0604 11:27:27.828876  113620 log.go:34] 2855 tables out of 5337 opened in 27.004s
I0604 11:27:30.827955  113620 log.go:34] 3136 tables out of 5337 opened in 30.003s
I0604 11:27:33.856535  113620 log.go:34] 3398 tables out of 5337 opened in 33.032s
I0604 11:27:36.830251  113620 log.go:34] 3854 tables out of 5337 opened in 36.006s
I0604 11:27:39.829359  113620 log.go:34] 4362 tables out of 5337 opened in 39.005s
I0604 11:27:42.833415  113620 log.go:34] 4853 tables out of 5337 opened in 42.009s
I0604 11:27:45.832015  113620 log.go:34] 5212 tables out of 5337 opened in 45.008s
I0604 11:27:46.717500  113620 log.go:34] All 5337 tables opened in 45.893s
2020/06/04 11:27:47 Inter: Biggest(j-1)
00000000  21 62 61 64 67 65 72 21  68 65 61 64 ff ff ff ff  |!badger!head....|
00000010  ff ff ff fe                                       |....|

 vs Smallest(j):
00000000  00 00 02 62 67 00 28 bf  2e d8 27 18 c1 10 ff ff  |...bg.(...'.....|
00000010  ff ff ff 97 c2 ce                                 |......|

: level=4 j=2 numTables=341
github.com/dgraph-io/badger/v2.(*levelHandler).validate
        /go/pkg/mod/github.com/dgraph-io/badger/v2@v2.0.1-rc1.0.20200528205344-e7b6e76f96e8/util.go:55
github.com/dgraph-io/badger/v2.(*levelsController).validate
        /go/pkg/mod/github.com/dgraph-io/badger/v2@v2.0.1-rc1.0.20200528205344-e7b6e76f96e8/util.go:33
github.com/dgraph-io/badger/v2.newLevelsController
        /go/pkg/mod/github.com/dgraph-io/badger/v2@v2.0.1-rc1.0.20200528205344-e7b6e76f96e8/levels.go:191
github.com/dgraph-io/badger/v2.Open
        /go/pkg/mod/github.com/dgraph-io/badger/v2@v2.0.1-rc1.0.20200528205344-e7b6e76f96e8/db.go:348
github.com/dgraph-io/badger/v2.OpenManaged
        /go/pkg/mod/github.com/dgraph-io/badger/v2@v2.0.1-rc1.0.20200528205344-e7b6e76f96e8/managed_db.go:26
github.com/dgraph-io/dgraph/worker.(*ServerState).initStorage
        /ext-go/1/src/github.com/dgraph-io/dgraph/worker/server_state.go:167
github.com/dgraph-io/dgraph/worker.InitServerState
        /ext-go/1/src/github.com/dgraph-io/dgraph/worker/server_state.go:57
github.com/dgraph-io/dgraph/dgraph/cmd/alpha.run
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/alpha/run.go:612
github.com/dgraph-io/dgraph/dgraph/cmd/alpha.init.2.func1
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/alpha/run.go:90
github.com/spf13/cobra.(*Command).execute
        /go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:830
github.com/spf13/cobra.(*Command).ExecuteC
        /go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:914
github.com/spf13/cobra.(*Command).Execute
        /go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:864
github.com/dgraph-io/dgraph/dgraph/cmd.Execute
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/root.go:70
main.main
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/main.go:78
runtime.main
        /usr/local/go/src/runtime/proc.go:203
runtime.goexit
        /usr/local/go/src/runtime/asm_amd64.s:1373
Levels Controller
github.com/dgraph-io/badger/v2.(*levelsController).validate
        /go/pkg/mod/github.com/dgraph-io/badger/v2@v2.0.1-rc1.0.20200528205344-e7b6e76f96e8/util.go:34
github.com/dgraph-io/badger/v2.newLevelsController
        /go/pkg/mod/github.com/dgraph-io/badger/v2@v2.0.1-rc1.0.20200528205344-e7b6e76f96e8/levels.go:191
github.com/dgraph-io/badger/v2.Open
        /go/pkg/mod/github.com/dgraph-io/badger/v2@v2.0.1-rc1.0.20200528205344-e7b6e76f96e8/db.go:348
github.com/dgraph-io/badger/v2.OpenManaged
        /go/pkg/mod/github.com/dgraph-io/badger/v2@v2.0.1-rc1.0.20200528205344-e7b6e76f96e8/managed_db.go:26
github.com/dgraph-io/dgraph/worker.(*ServerState).initStorage
        /ext-go/1/src/github.com/dgraph-io/dgraph/worker/server_state.go:167
github.com/dgraph-io/dgraph/worker.InitServerState
        /ext-go/1/src/github.com/dgraph-io/dgraph/worker/server_state.go:57
github.com/dgraph-io/dgraph/dgraph/cmd/alpha.run
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/alpha/run.go:612
github.com/dgraph-io/dgraph/dgraph/cmd/alpha.init.2.func1
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/alpha/run.go:90
github.com/spf13/cobra.(*Command).execute
        /go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:830
github.com/spf13/cobra.(*Command).ExecuteC
        /go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:914
github.com/spf13/cobra.(*Command).Execute
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/root.go:70
main.main
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/main.go:78
runtime.main
        /usr/local/go/src/runtime/proc.go:203
runtime.goexit
        /usr/local/go/src/runtime/asm_amd64.s:1373
Level validation
github.com/dgraph-io/badger/v2.newLevelsController
        /go/pkg/mod/github.com/dgraph-io/badger/v2@v2.0.1-rc1.0.20200528205344-e7b6e76f96e8/levels.go:193
github.com/dgraph-io/badger/v2.Open
        /go/pkg/mod/github.com/dgraph-io/badger/v2@v2.0.1-rc1.0.20200528205344-e7b6e76f96e8/db.go:348
github.com/dgraph-io/badger/v2.OpenManaged
        /go/pkg/mod/github.com/dgraph-io/badger/v2@v2.0.1-rc1.0.20200528205344-e7b6e76f96e8/managed_db.go:26
github.com/dgraph-io/dgraph/worker.(*ServerState).initStorage
        /ext-go/1/src/github.com/dgraph-io/dgraph/worker/server_state.go:167
github.com/dgraph-io/dgraph/worker.InitServerState
        /ext-go/1/src/github.com/dgraph-io/dgraph/worker/server_state.go:57
github.com/dgraph-io/dgraph/dgraph/cmd/alpha.run
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/alpha/run.go:612
github.com/dgraph-io/dgraph/dgraph/cmd/alpha.init.2.func1
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/alpha/run.go:90
github.com/spf13/cobra.(*Command).execute
        /go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:830
github.com/spf13/cobra.(*Command).ExecuteC
        /go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:914
github.com/spf13/cobra.(*Command).Execute
        /go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:864
github.com/dgraph-io/dgraph/dgraph/cmd.Execute
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/root.go:70
main.main
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/main.go:78
runtime.main
        /usr/local/go/src/runtime/proc.go:203
runtime.goexit
        /usr/local/go/src/runtime/asm_amd64.s:1373
Error while creating badger KV posting store
github.com/dgraph-io/dgraph/x.Checkf
        /ext-go/1/src/github.com/dgraph-io/dgraph/x/error.go:51
github.com/dgraph-io/dgraph/worker.(*ServerState).initStorage
        /ext-go/1/src/github.com/dgraph-io/dgraph/worker/server_state.go:168
github.com/dgraph-io/dgraph/worker.InitServerState
        /ext-go/1/src/github.com/dgraph-io/dgraph/worker/server_state.go:57
github.com/dgraph-io/dgraph/dgraph/cmd/alpha.run
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/alpha/run.go:612
github.com/dgraph-io/dgraph/dgraph/cmd/alpha.init.2.func1
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/alpha/run.go:90
github.com/spf13/cobra.(*Command).execute
        /go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:830
github.com/spf13/cobra.(*Command).ExecuteC
        /go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:914
github.com/spf13/cobra.(*Command).Execute
        /go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:864
github.com/dgraph-io/dgraph/dgraph/cmd.Execute
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/root.go:70
main.main
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/main.go:78
runtime.main
        /usr/local/go/src/runtime/proc.go:203
runtime.goexit
        /usr/local/go/src/runtime/asm_amd64.s:1373
@jarifibrahim
Copy link
Contributor

jarifibrahim commented Jun 4, 2020

@JimWen Does the same data directory work with v20.03.1 after a restart? Badger levels are supposed to be sorted. The crash indicates that your data is not sorted. !badger!head!key should not be stored before the ...bg.(...'... key (which looks very suspicious to me).

@balajijinnah I suspect this is the result of the index caching that we added to badger recently. We're comparing only the smallest and the biggest keys for the tables https://github.com/dgraph-io/badger/blob/056d859606380082d8694d2ad3b3ebf4b10318d7/util.go#L54 and the biggest might be fetched from the cache.

@JimWen
Copy link
Author

JimWen commented Jun 4, 2020

@JimWen Does the same data directory work with v20.03.1 after a restart? Badger levels are supposed to be sorted. The crash indicates that your data is not sorted. !badger!head!key should not be stored before the ...bg.(...'... key (which looks very suspicious to me).

@balajijinnah I suspect this is the result of the index caching that we added to badger recently. We're comparing only the smallest and the biggest keys for the tables https://github.com/dgraph-io/badger/blob/056d859606380082d8694d2ad3b3ebf4b10318d7/util.go#L54 and the biggest might be fetched from the cache.

v20.03.1 is fine

@danielmai danielmai added area/crash Dgraph issues that cause an operation to fail, or the whole server to crash. status/accepted We accept to investigate/work on it. labels Jun 4, 2020
@jarifibrahim
Copy link
Contributor

@JimWen This is definitely a bug in Badger. Your keys are not sorted which shouldn't happen. I'll run some experiments and circle back.

@JimWen
Copy link
Author

JimWen commented Jun 5, 2020

@JimWen This is definitely a bug in Badger. Your keys are not sorted which shouldn't happen. I'll run some experiments and circle back.

OK, thank you

@JimWen
Copy link
Author

JimWen commented Jun 6, 2020

@jarifibrahim when i bulkload, something is wrong at reduce phase and crash, and i think may be this is the same problem, stack is as followings

2020/06/06 10:33:19 Assert failed
github.com/dgraph-io/badger/v2/y.AssertTrue
        /root/go/pkg/mod/github.com/dgraph-io/badger/v2@v2.0.1-rc1.0.20200528205344-e7b6e76f96e8/y/error.go:55
github.com/dgraph-io/badger/v2.(*sortedWriter).handleRequests.func1
        /root/go/pkg/mod/github.com/dgraph-io/badger/v2@v2.0.1-rc1.0.20200528205344-e7b6e76f96e8/stream_writer.go:309
github.com/dgraph-io/badger/v2.(*sortedWriter).handleRequests
        /root/go/pkg/mod/github.com/dgraph-io/badger/v2@v2.0.1-rc1.0.20200528205344-e7b6e76f96e8/stream_writer.go:339
runtime.goexit
        /usr/lib/golang/src/runtime/asm_amd64.s:1357

@jarifibrahim
Copy link
Contributor

@JimWen How much data were you trying to insert? I think this shows up only when you have sufficiently large dataset.

@JimWen
Copy link
Author

JimWen commented Jun 10, 2020

@JimWen How much data were you trying to insert? I think this shows up only when you have sufficiently large dataset.

Yes, about 5 billion N-Quads

@jarifibrahim jarifibrahim self-assigned this Jun 25, 2020
@jarifibrahim
Copy link
Contributor

This has been fixed in badger via dgraph-io/badger#1380 . I'll close this issue once badger is updated in master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/crash Dgraph issues that cause an operation to fail, or the whole server to crash. status/accepted We accept to investigate/work on it.
Development

Successfully merging a pull request may close this issue.

3 participants