Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bulkload OOM when loading big dataset #5574

Closed
JimWen opened this issue Jun 4, 2020 · 7 comments
Closed

Bulkload OOM when loading big dataset #5574

JimWen opened this issue Jun 4, 2020 · 7 comments
Labels
area/bulk-loader Issues related to bulk loading.

Comments

@JimWen
Copy link

JimWen commented Jun 4, 2020

What version of Dgraph are you using?

Dgraph version : v20.03.3
Dgraph SHA-256 : 08424035910be6b6720570427948bab8352a0b5a6d59a0d20c3ec5ed29533121
Commit SHA-1 : fa3c191
Commit timestamp : 2020-06-02 16:47:25 -0700
Branch : HEAD
Go version : go1.14.1

Have you tried reproducing the issue with the latest release?

yes

What is the hardware spec (RAM, OS)?

128G mem & 1.8T SSD

Linux version 3.10.0-1062.9.1.el7.x86_64 (mockbuild@kbuilder.bsys.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-39) (GCC) ) #1 SMP Fri Dec 6 15:49:49 UTC 2019

Steps to reproduce the issue (command/config used to run Dgraph).

Bulkload big dataset

Expected behaviour and actual result.

Log is as followings:

[14:50:58+0800] REDUCE 02h56m55s 28.96% edge_count:2.924G edge_speed:879.5k/sec plist_count:1.837G plist_speed:552.4k/sec. Num Encoding: 0
[14:50:59+0800] REDUCE 02h56m56s 28.96% edge_count:2.924G edge_speed:879.3k/sec plist_count:1.837G plist_speed:552.3k/sec. Num Encoding: 0
fatal error: runtime: out of memory

runtime stack:
runtime.throw(0x195a44b, 0x16)
        /usr/local/go/src/runtime/panic.go:1114 +0x72
runtime.sysMap(0xddd4000000, 0xc8000000, 0x2834d78)
        /usr/local/go/src/runtime/mem_linux.go:169 +0xc5
runtime.(*mheap).sysAlloc(0x281f840, 0xc5800000, 0x281f848, 0x62b4a)
        /usr/local/go/src/runtime/malloc.go:715 +0x1cd
runtime.(*mheap).grow(0x281f840, 0x62b4a, 0x0)
        /usr/local/go/src/runtime/mheap.go:1286 +0x11c
runtime.(*mheap).allocSpan(0x281f840, 0x62b4a, 0x0, 0x2834d88, 0xfffffffffffffade)
        /usr/local/go/src/runtime/mheap.go:1124 +0x6a0
runtime.(*mheap).alloc.func1()
        /usr/local/go/src/runtime/mheap.go:871 +0x64
runtime.(*mheap).alloc(0x281f840, 0x62b4a, 0xc000000100, 0xc6a8164180)
        /usr/local/go/src/runtime/mheap.go:865 +0x81
runtime.largeAlloc(0xc5694000, 0xc6bd610001, 0x4)
        /usr/local/go/src/runtime/malloc.go:1152 +0x92
runtime.mallocgc.func1()
        /usr/local/go/src/runtime/malloc.go:1047 +0x46
runtime.systemstack(0x0)
        /usr/local/go/src/runtime/asm_amd64.s:370 +0x66
runtime.mstart()
        /usr/local/go/src/runtime/proc.go:1041

goroutine 6534 [running]:
runtime.systemstack_switch()
        /usr/local/go/src/runtime/asm_amd64.s:330 fp=0xc0006e8bf8 sp=0xc0006e8bf0 pc=0x9b16b0
runtime.mallocgc(0xc5694000, 0x172afa0, 0xc6a8bdfe01, 0x9ad6f0)
        /usr/local/go/src/runtime/malloc.go:1046 +0x895 fp=0xc0006e8c98 sp=0xc0006e8bf8 pc=0x957fb5
runtime.growslice(0x172afa0, 0xdc9429a000, 0x68abeb8, 0x69492aa, 0x697152d, 0xae1a8, 0xc6555, 0xae1a8)
        /usr/local/go/src/runtime/slice.go:181 +0x1e2 fp=0xc0006e8d00 sp=0xc0006e8c98 pc=0x998c52
github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*reducer).reduce(0xcbc1d060f0, 0xc6e13a8000, 0x4ede, 0x5000, 0xc6e0b50000, 0x7e3, 0x900, 0xc6ba306040)
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/reduce.go:364 +0x48f fp=0xc0006e8e50 sp=0xc0006e8d00 pc=0x156394f
github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*reducer).run.func1(0xc00039e700, 0xc6a54e4000, 0x3, 0x64, 0xcbc1d060f0, 0x0, 0xd6a0c87400)
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/reduce.go:91 +0x53e fp=0xc0006e8fa8 sp=0xc0006e8e50 pc=0x1568d4e
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:1373 +0x1 fp=0xc0006e8fb0 sp=0xc0006e8fa8 pc=0x9b37c1
created by github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*reducer).run
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/reduce.go:67 +0x240

goroutine 1 [select, 55 minutes]:
github.com/dgraph-io/badger/v2/y.(*Throttle).Do(0xc00039e700, 0x19b72c8, 0xc00039e700)
        /go/pkg/mod/github.com/dgraph-io/badger/v2@v2.0.1-rc1.0.20200528205344-e7b6e76f96e8/y/y.go:267 +0xb9
github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*reducer).run(0xcbc1d060f0, 0xcbc1d060f0, 0x19456b7)
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/reduce.go:64 +0x273
github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*loader).reduceStage(0xc0007401b0)
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/loader.go:299 +0x8e
github.com/dgraph-io/dgraph/dgraph/cmd/bulk.run()
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/run.go:241 +0xd99
github.com/dgraph-io/dgraph/dgraph/cmd/bulk.init.0.func1(0xc00018e780, 0xc00001e0b0, 0x0, 0xb)
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/run.go:49 +0x62
github.com/spf13/cobra.(*Command).execute(0xc00018e780, 0xc00001e000, 0xb, 0xb, 0xc00018e780, 0xc00001e000)
        /go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:830 +0x29d
github.com/spf13/cobra.(*Command).ExecuteC(0x2616600, 0x0, 0x0, 0x0)
        /go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:914 +0x2fb
github.com/spf13/cobra.(*Command).Execute(...)
        /go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:864
github.com/dgraph-io/dgraph/dgraph/cmd.Execute()
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/root.go:70 +0x80
main.main()
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/main.go:78 +0xb0

is seems to be caused by loading all the predicate keys into mem at reduce phase

@JimWen
Copy link
Author

JimWen commented Jun 4, 2020

here is the heap

Showing top 10 nodes out of 41
      flat  flat%   sum%        cum   cum%
 2776.80MB 27.05% 27.05%  5058.40MB 49.27%  github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*mapIterator).startBatching
 2279.60MB 22.20% 49.25%  2281.61MB 22.22%  github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*mapIterator).startBatching.func1
 1963.88MB 19.13% 68.38%  2852.92MB 27.79%  github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*reducer).toList.func2
  573.61MB  5.59% 73.96%   837.12MB  8.15%  github.com/dgraph-io/badger/v2/pb.(*TableIndex).Unmarshal
  533.14MB  5.19% 79.16%   533.14MB  5.19%  github.com/dgraph-io/ristretto/z.(*Bloom).Size
  531.53MB  5.18% 84.33%   531.53MB  5.18%  github.com/dgraph-io/dgraph/protos/pb.(*PostingList).Marshal
  430.54MB  4.19% 88.53%   430.54MB  4.19%  github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*reducer).reduce
     297MB  2.89% 91.42%      297MB  2.89%  github.com/dgraph-io/badger/v2/y.Copy (inline)
  263.51MB  2.57% 93.99%   263.51MB  2.57%  github.com/dgraph-io/badger/v2/pb.(*BlockOffset).Unmarshal
     138MB  1.34% 95.33%      138MB  1.34%  github.com/dgraph-io/badger/v2/table.NewTableBuilder

@ashish-goswami
Copy link
Contributor

Might be related to #5361

@ashish-goswami
Copy link
Contributor

Hi @JimWen, as mentioned in #5361, can you please repeat bulk loading with v1.1.1, if possible. Let us know if bulk load is successful or not.

@danielmai danielmai added the area/bulk-loader Issues related to bulk loading. label Jun 4, 2020
@JimWen
Copy link
Author

JimWen commented Jun 5, 2020

Hi @JimWen, as mentioned in #5361, can you please repeat bulk loading with v1.1.1, if possible. Let us know if bulk load is successful or not.

ok i will try it later

@JimWen
Copy link
Author

JimWen commented Jun 7, 2020

@ashish-goswami v1.1.1 is still oom, but it oom later compared with this version where reduce progress is much greater.

@harshil-goel
Copy link
Contributor

Can you try increasing the number of shards (both map and reduce) and check again? It would be really helpful.

@minhaj-shakeel
Copy link
Contributor

Github issues have been deprecated.
This issue has been moved to discuss. You can follow the conversation there and also subscribe to updates by changing your notification preferences.

drawing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/bulk-loader Issues related to bulk loading.
Development

No branches or pull requests

5 participants