Skip to content

stability: raft panic after process death #9037

@mberhault

Description

@mberhault

rho-1 (104.196.18.63) was killed uncleanly (I was trying to replace supervisord with a newer version and reset the machine to get around wedged systemctl).

build sha: 1a34ca3 with race-detector enabled.

After the machine came back, cockroach now dies at startup with:

panic: store=1:1 range=1078 [/Table/55/1/2206910991923582422/"870569b6-bf23-49b0-b5a7-da8a4f8a1d2a"/1537136-/Table/55/1/2216032119155182690/"cf1a193d-401b-4c12-917a-41800f7d85b3"/122879): tocommit(20055) is out of range [lastIndex(20053)]. Was the raft log corrupted, truncated, or lost? [recovered]
        panic: store=1:1 range=1078 [/Table/55/1/2206910991923582422/"870569b6-bf23-49b0-b5a7-da8a4f8a1d2a"/1537136-/Table/55/1/2216032119155182690/"cf1a193d-401b-4c12-917a-41800f7d85b3"/122879): tocommit(20055) is out of range [lastIndex(20053)]. Was the raft log corrupted, truncated, or lost?

goroutine 308 [running]:
panic(0x1abeae0, 0xc420e09c80)
        /usr/local/go/src/runtime/panic.go:500 +0x1ae
github.com/cockroachdb/cockroach/cli.initBacktrace.func2(0x1abeae0, 0xc420e09c80)
        /go/src/github.com/cockroachdb/cockroach/cli/backtrace.go:98 +0xe8
github.com/cockroachdb/cockroach/util/stop.(*Stopper).Recover(0xc42045c510)
        /go/src/github.com/cockroachdb/cockroach/util/stop/stopper.go:173 +0xb1
panic(0x1abeae0, 0xc420e09c80)
        /usr/local/go/src/runtime/panic.go:458 +0x271
github.com/cockroachdb/cockroach/storage.(*raftLogger).Panicf(0xc420e09620, 0x1ca21e8, 0x5d, 0xc420a249c0, 0x2, 0x2)
        /go/src/github.com/cockroachdb/cockroach/storage/raft.go:121 +0x225
github.com/coreos/etcd/raft.(*raftLog).commitTo(0xc420d2cc40, 0x4e57)
        /go/src/github.com/coreos/etcd/raft/log.go:191 +0x208
github.com/coreos/etcd/raft.(*raft).handleHeartbeat(0xc420dca960, 0x8, 0x1e, 0x1b, 0x4bb, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /go/src/github.com/coreos/etcd/raft/raft.go:916 +0x7a
github.com/coreos/etcd/raft.stepFollower(0xc420dca960, 0x8, 0x1e, 0x1b, 0x4bb, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /go/src/github.com/coreos/etcd/raft/raft.go:855 +0x1ed4
github.com/coreos/etcd/raft.(*raft).Step(0xc420dca960, 0x8, 0x1e, 0x1b, 0x4bb, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /go/src/github.com/coreos/etcd/raft/raft.go:633 +0xc0
github.com/coreos/etcd/raft.(*RawNode).Step(0xc4212d3400, 0x8, 0x1e, 0x1b, 0x4bb, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /go/src/github.com/coreos/etcd/raft/rawnode.go:180 +0x17d
github.com/cockroachdb/cockroach/storage.(*Store).HandleRaftRequest.func2(0xc4212d3400, 0x0, 0x0)
        /go/src/github.com/cockroachdb/cockroach/storage/store.go:2388 +0xb3
github.com/cockroachdb/cockroach/storage.(*Replica).withRaftGroupLocked(0xc420902900, 0xc42124b7f0, 0xc420902960, 0xc)
        /go/src/github.com/cockroachdb/cockroach/storage/replica.go:379 +0x172
github.com/cockroachdb/cockroach/storage.(*Replica).withRaftGroup(0xc420902900, 0xc42124b7f0, 0x0, 0x0)
        /go/src/github.com/cockroachdb/cockroach/storage/replica.go:388 +0x98
github.com/cockroachdb/cockroach/storage.(*Store).HandleRaftRequest(0xc42030cc80, 0x7f55bba4a280, 0xc4208eca50, 0xc420b86f00, 0x0)
        /go/src/github.com/cockroachdb/cockroach/storage/store.go:2389 +0x146c
github.com/cockroachdb/cockroach/storage.(*RaftTransport).handleRaftRequest(0xc420450800, 0x7f55bba4a280, 0xc4208eca50, 0xc420b86f00, 0x0)
        /go/src/github.com/cockroachdb/cockroach/storage/raft_transport.go:236 +0x13d
github.com/cockroachdb/cockroach/storage.(*RaftTransport).RaftMessage.func1.1.1(0x2804e00, 0xc420f8a2c0, 0xc420450800, 0x671700, 0xc421072f88)
        /go/src/github.com/cockroachdb/cockroach/storage/raft_transport.go:285 +0xdf
github.com/cockroachdb/cockroach/storage.(*RaftTransport).RaftMessage.func1.1()
        /go/src/github.com/cockroachdb/cockroach/storage/raft_transport.go:292 +0x61
github.com/cockroachdb/cockroach/util/stop.(*Stopper).RunWorker.func1(0xc42045c510, 0xc4208ecab0)
        /go/src/github.com/cockroachdb/cockroach/util/stop/stopper.go:187 +0x8b
created by github.com/cockroachdb/cockroach/util/stop.(*Stopper).RunWorker
        /go/src/github.com/cockroachdb/cockroach/util/stop/stopper.go:188 +0x74

Here is the log, including the last few attempts to start it and the initial death around I160901 16:46:15
cockroach.stderr.txt

The process is currently down, so feel free to poke around.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions