Skip to content

DB cannot be opened after node hard reset #562

Closed
@fyfyrchik

Description

@fyfyrchik

Hello! I am not sure whether this is a bug in bolt, but I think you might find this interesting:

  1. Create ext4 with fast_commit feature enabled (we use 5.10 kernel)
  2. Create DB with default settings (i.e. no unsafe settings, (my only custom parameters are batch size and batch delay, they seem unrelated).
  3. Perform server hard-reset under write load.
  4. DB cannot be opened, bbolt check reports:
panic: assertion failed: Page expected to be: 23092, but self identifies as 0

goroutine 6 [running]:
go.etcd.io/bbolt._assert(...)
        /repo/bbolt/db.go:1359
go.etcd.io/bbolt.(*page).fastCheck(0x7f0b95834000, 0x5a34)
        /repo/bbolt/page.go:57 +0x1d9
go.etcd.io/bbolt.(*Tx).page(0x7f0b909be000?, 0x4f9cc0?)
        /repo/bbolt/tx.go:534 +0x79
go.etcd.io/bbolt.(*Tx).forEachPageInternal(0x7f0b8fe11000?, {0xc000024140?, 0x3, 0xa}, 0xc00008db68)
        /repo/bbolt/tx.go:546 +0x5d
go.etcd.io/bbolt.(*Tx).forEachPageInternal(0x7f0b8fe19000?, {0xc000024140?, 0x2, 0xa}, 0xc00008db68)
        /repo/bbolt/tx.go:555 +0xc8
go.etcd.io/bbolt.(*Tx).forEachPageInternal(0x0?, {0xc000024140?, 0x1, 0xa}, 0xc00008db68)
        /repo/bbolt/tx.go:555 +0xc8
go.etcd.io/bbolt.(*Tx).forEachPage(...)
        /repo/bbolt/tx.go:542
go.etcd.io/bbolt.(*Tx).checkBucket(0xc00013e000, 0xc00002a280, 0xc00008deb0, 0xc00008dee0, {0x54b920?, 0x62ec60}, 0xc00007c0c0)
        /repo/bbolt/tx_check.go:83 +0x111
go.etcd.io/bbolt.(*Tx).checkBucket.func2({0x7f0b8fe3e0a6?, 0xc000104d28?, 0xc00013e000?})
        /repo/bbolt/tx_check.go:110 +0x90
go.etcd.io/bbolt.(*Bucket).ForEachBucket(0x0?, 0xc00008dd70)
        /repo/bbolt/bucket.go:403 +0x96
go.etcd.io/bbolt.(*Tx).checkBucket(0xc00013e000, 0xc00013e018, 0xc000104eb0, 0xc000104ee0, {0x54b920?, 0x62ec60}, 0xc00007c0c0)
        /repo/bbolt/tx_check.go:108 +0x252
go.etcd.io/bbolt.(*Tx).check(0xc00013e000, {0x54b920, 0x62ec60}, 0x0?)
        /repo/bbolt/tx_check.go:61 +0x365
created by go.etcd.io/bbolt.(*Tx).CheckWithOptions in goroutine 1
        /repo/bbolt/tx_check.go:31 +0x118

The last 64 pages of the file seem to be filled with zeroes.

When the ext4 fast_commit is disabled, our tests pass, and db can be opened.
I have reproduced this on both 1.3.6 and 1.3.7.

Here is the output for bbolt pages https://gist.github.com/fyfyrchik/4aafec23d9dfc487fb4a4cd7f5560730

Meta pages

$ ./bbolt page ./1 0
Page ID:    0
Page Type:  meta
Total Size: 4096 bytes
Overflow pages: 0
Version:    2
Page Size:  4096 bytes
Flags:      00000000
Root:       <pgid=26>
Freelist:   <pgid=97>
HWM:        <pgid=23157>
Txn ID:     1198
Checksum:   6397aaef7230fab5

$ ./bbolt page ./1 1
Page ID:    1
Page Type:  meta
Total Size: 4096 bytes
Overflow pages: 0
Version:    2
Page Size:  4096 bytes
Flags:      00000000
Root:       <pgid=62>
Freelist:   <pgid=102>
HWM:        <pgid=23157>
Txn ID:     1199
Checksum:   8ec692c6ba15e06b

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions