Bulkload OOM when loading big dataset #5574

JimWen · 2020-06-04T07:08:19Z

What version of Dgraph are you using?

Dgraph version : v20.03.3
Dgraph SHA-256 : 08424035910be6b6720570427948bab8352a0b5a6d59a0d20c3ec5ed29533121
Commit SHA-1 : fa3c191
Commit timestamp : 2020-06-02 16:47:25 -0700
Branch : HEAD
Go version : go1.14.1

Have you tried reproducing the issue with the latest release?

yes

What is the hardware spec (RAM, OS)?

128G mem & 1.8T SSD

Linux version 3.10.0-1062.9.1.el7.x86_64 (mockbuild@kbuilder.bsys.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-39) (GCC) ) #1 SMP Fri Dec 6 15:49:49 UTC 2019

Steps to reproduce the issue (command/config used to run Dgraph).

Bulkload big dataset

Expected behaviour and actual result.

Log is as followings:

[14:50:58+0800] REDUCE 02h56m55s 28.96% edge_count:2.924G edge_speed:879.5k/sec plist_count:1.837G plist_speed:552.4k/sec. Num Encoding: 0
[14:50:59+0800] REDUCE 02h56m56s 28.96% edge_count:2.924G edge_speed:879.3k/sec plist_count:1.837G plist_speed:552.3k/sec. Num Encoding: 0
fatal error: runtime: out of memory

runtime stack:
runtime.throw(0x195a44b, 0x16)
        /usr/local/go/src/runtime/panic.go:1114 +0x72
runtime.sysMap(0xddd4000000, 0xc8000000, 0x2834d78)
        /usr/local/go/src/runtime/mem_linux.go:169 +0xc5
runtime.(*mheap).sysAlloc(0x281f840, 0xc5800000, 0x281f848, 0x62b4a)
        /usr/local/go/src/runtime/malloc.go:715 +0x1cd
runtime.(*mheap).grow(0x281f840, 0x62b4a, 0x0)
        /usr/local/go/src/runtime/mheap.go:1286 +0x11c
runtime.(*mheap).allocSpan(0x281f840, 0x62b4a, 0x0, 0x2834d88, 0xfffffffffffffade)
        /usr/local/go/src/runtime/mheap.go:1124 +0x6a0
runtime.(*mheap).alloc.func1()
        /usr/local/go/src/runtime/mheap.go:871 +0x64
runtime.(*mheap).alloc(0x281f840, 0x62b4a, 0xc000000100, 0xc6a8164180)
        /usr/local/go/src/runtime/mheap.go:865 +0x81
runtime.largeAlloc(0xc5694000, 0xc6bd610001, 0x4)
        /usr/local/go/src/runtime/malloc.go:1152 +0x92
runtime.mallocgc.func1()
        /usr/local/go/src/runtime/malloc.go:1047 +0x46
runtime.systemstack(0x0)
        /usr/local/go/src/runtime/asm_amd64.s:370 +0x66
runtime.mstart()
        /usr/local/go/src/runtime/proc.go:1041

goroutine 6534 [running]:
runtime.systemstack_switch()
        /usr/local/go/src/runtime/asm_amd64.s:330 fp=0xc0006e8bf8 sp=0xc0006e8bf0 pc=0x9b16b0
runtime.mallocgc(0xc5694000, 0x172afa0, 0xc6a8bdfe01, 0x9ad6f0)
        /usr/local/go/src/runtime/malloc.go:1046 +0x895 fp=0xc0006e8c98 sp=0xc0006e8bf8 pc=0x957fb5
runtime.growslice(0x172afa0, 0xdc9429a000, 0x68abeb8, 0x69492aa, 0x697152d, 0xae1a8, 0xc6555, 0xae1a8)
        /usr/local/go/src/runtime/slice.go:181 +0x1e2 fp=0xc0006e8d00 sp=0xc0006e8c98 pc=0x998c52
github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*reducer).reduce(0xcbc1d060f0, 0xc6e13a8000, 0x4ede, 0x5000, 0xc6e0b50000, 0x7e3, 0x900, 0xc6ba306040)
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/reduce.go:364 +0x48f fp=0xc0006e8e50 sp=0xc0006e8d00 pc=0x156394f
github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*reducer).run.func1(0xc00039e700, 0xc6a54e4000, 0x3, 0x64, 0xcbc1d060f0, 0x0, 0xd6a0c87400)
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/reduce.go:91 +0x53e fp=0xc0006e8fa8 sp=0xc0006e8e50 pc=0x1568d4e
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:1373 +0x1 fp=0xc0006e8fb0 sp=0xc0006e8fa8 pc=0x9b37c1
created by github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*reducer).run
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/reduce.go:67 +0x240

goroutine 1 [select, 55 minutes]:
github.com/dgraph-io/badger/v2/y.(*Throttle).Do(0xc00039e700, 0x19b72c8, 0xc00039e700)
        /go/pkg/mod/github.com/dgraph-io/badger/v2@v2.0.1-rc1.0.20200528205344-e7b6e76f96e8/y/y.go:267 +0xb9
github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*reducer).run(0xcbc1d060f0, 0xcbc1d060f0, 0x19456b7)
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/reduce.go:64 +0x273
github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*loader).reduceStage(0xc0007401b0)
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/loader.go:299 +0x8e
github.com/dgraph-io/dgraph/dgraph/cmd/bulk.run()
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/run.go:241 +0xd99
github.com/dgraph-io/dgraph/dgraph/cmd/bulk.init.0.func1(0xc00018e780, 0xc00001e0b0, 0x0, 0xb)
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/run.go:49 +0x62
github.com/spf13/cobra.(*Command).execute(0xc00018e780, 0xc00001e000, 0xb, 0xb, 0xc00018e780, 0xc00001e000)
        /go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:830 +0x29d
github.com/spf13/cobra.(*Command).ExecuteC(0x2616600, 0x0, 0x0, 0x0)
        /go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:914 +0x2fb
github.com/spf13/cobra.(*Command).Execute(...)
        /go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:864
github.com/dgraph-io/dgraph/dgraph/cmd.Execute()
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/root.go:70 +0x80
main.main()
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/main.go:78 +0xb0

is seems to be caused by loading all the predicate keys into mem at reduce phase

The text was updated successfully, but these errors were encountered:

JimWen · 2020-06-04T14:50:26Z

here is the heap

Showing top 10 nodes out of 41
      flat  flat%   sum%        cum   cum%
 2776.80MB 27.05% 27.05%  5058.40MB 49.27%  github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*mapIterator).startBatching
 2279.60MB 22.20% 49.25%  2281.61MB 22.22%  github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*mapIterator).startBatching.func1
 1963.88MB 19.13% 68.38%  2852.92MB 27.79%  github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*reducer).toList.func2
  573.61MB  5.59% 73.96%   837.12MB  8.15%  github.com/dgraph-io/badger/v2/pb.(*TableIndex).Unmarshal
  533.14MB  5.19% 79.16%   533.14MB  5.19%  github.com/dgraph-io/ristretto/z.(*Bloom).Size
  531.53MB  5.18% 84.33%   531.53MB  5.18%  github.com/dgraph-io/dgraph/protos/pb.(*PostingList).Marshal
  430.54MB  4.19% 88.53%   430.54MB  4.19%  github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*reducer).reduce
     297MB  2.89% 91.42%      297MB  2.89%  github.com/dgraph-io/badger/v2/y.Copy (inline)
  263.51MB  2.57% 93.99%   263.51MB  2.57%  github.com/dgraph-io/badger/v2/pb.(*BlockOffset).Unmarshal
     138MB  1.34% 95.33%      138MB  1.34%  github.com/dgraph-io/badger/v2/table.NewTableBuilder

ashish-goswami · 2020-06-04T15:20:17Z

Might be related to #5361

ashish-goswami · 2020-06-04T15:22:44Z

Hi @JimWen, as mentioned in #5361, can you please repeat bulk loading with v1.1.1, if possible. Let us know if bulk load is successful or not.

JimWen · 2020-06-05T04:54:08Z

Hi @JimWen, as mentioned in #5361, can you please repeat bulk loading with v1.1.1, if possible. Let us know if bulk load is successful or not.

ok i will try it later

JimWen · 2020-06-07T09:15:30Z

@ashish-goswami v1.1.1 is still oom, but it oom later compared with this version where reduce progress is much greater.

harshil-goel · 2020-07-11T00:05:40Z

Can you try increasing the number of shards (both map and reduce) and check again? It would be really helpful.

minhaj-shakeel · 2020-07-16T14:07:39Z

Github issues have been deprecated.
This issue has been moved to discuss. You can follow the conversation there and also subscribe to updates by changing your notification preferences.

danielmai added the area/bulk-loader Issues related to bulk loading. label Jun 4, 2020

minhaj-shakeel closed this as completed Jul 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bulkload OOM when loading big dataset #5574

Bulkload OOM when loading big dataset #5574

JimWen commented Jun 4, 2020

JimWen commented Jun 4, 2020

ashish-goswami commented Jun 4, 2020

ashish-goswami commented Jun 4, 2020

JimWen commented Jun 5, 2020

JimWen commented Jun 7, 2020 •

edited

Loading

harshil-goel commented Jul 11, 2020

minhaj-shakeel commented Jul 16, 2020

Bulkload OOM when loading big dataset #5574

Bulkload OOM when loading big dataset #5574

Comments

JimWen commented Jun 4, 2020

What version of Dgraph are you using?

Have you tried reproducing the issue with the latest release?

What is the hardware spec (RAM, OS)?

Steps to reproduce the issue (command/config used to run Dgraph).

Expected behaviour and actual result.

JimWen commented Jun 4, 2020

ashish-goswami commented Jun 4, 2020

ashish-goswami commented Jun 4, 2020

JimWen commented Jun 5, 2020

JimWen commented Jun 7, 2020 • edited Loading

harshil-goel commented Jul 11, 2020

minhaj-shakeel commented Jul 16, 2020

JimWen commented Jun 7, 2020 •

edited

Loading