-
Notifications
You must be signed in to change notification settings - Fork 4k
Closed
Description
cluster: cobalt (new cluster for continuous-deployment from empty clusters)
sha: b5a19ba
Cluster started from scratch at 161124-08:26:19 (well, this is the third node, but all started within 30s.
The third node (40.117.230.120) crashed after 5 minutes with:
E161124 08:31:27.294717 154 vendor/github.com/coreos/etcd/raft/log.go:191 [n3,s3,r22/42:{-}] tocommit(192123) is out of range [lastIndex(0)]. Was the raft log corrupted, truncated, or lost?
panic: tocommit(192123) is out of range [lastIndex(0)]. Was the raft log corrupted, truncated, or lost? [recovered]
panic: tocommit(192123) is out of range [lastIndex(0)]. Was the raft log corrupted, truncated, or lost?
goroutine 154 [running]:
panic(0x1789040, 0xc42671c9c0)
/usr/local/go/src/runtime/panic.go:500 +0x1a1
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).Recover(0xc4203ec5a0)
/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:185 +0x6e
panic(0x1789040, 0xc42671c9c0)
/usr/local/go/src/runtime/panic.go:458 +0x243
github.com/cockroachdb/cockroach/pkg/storage.(*raftLogger).Panicf(0xc42671c820, 0x19c3716, 0x5d, 0xc4224441a0, 0x2, 0x2)
/go/src/github.com/cockroachdb/cockroach/pkg/storage/raft.go:111 +0x107
github.com/cockroachdb/cockroach/vendor/github.com/coreos/etcd/raft.(*raftLog).commitTo(0xc42525afc0, 0x2ee7b)
/go/src/github.com/cockroachdb/cockroach/vendor/github.com/coreos/etcd/raft/log.go:191 +0x166
github.com/cockroachdb/cockroach/vendor/github.com/coreos/etcd/raft.(*raft).handleHeartbeat(0xc42542ac30, 0x8, 0x2a, 0x1, 0x6, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
/go/src/github.com/cockroachdb/cockroach/vendor/github.com/coreos/etcd/raft/raft.go:1095 +0x54
github.com/cockroachdb/cockroach/vendor/github.com/coreos/etcd/raft.stepFollower(0xc42542ac30, 0x8, 0x2a, 0x1, 0x6, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
/go/src/github.com/cockroachdb/cockroach/vendor/github.com/coreos/etcd/raft/raft.go:1041 +0x31f
github.com/cockroachdb/cockroach/vendor/github.com/coreos/etcd/raft.(*raft).Step(0xc42542ac30, 0x8, 0x2a, 0x1, 0x6, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
/go/src/github.com/cockroachdb/cockroach/vendor/github.com/coreos/etcd/raft/raft.go:779 +0x11ef
github.com/cockroachdb/cockroach/vendor/github.com/coreos/etcd/raft.(*RawNode).Step(0xc420e64240, 0x8, 0x2a, 0x1, 0x6, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
/go/src/github.com/cockroachdb/cockroach/vendor/github.com/coreos/etcd/raft/rawnode.go:195 +0xc9
github.com/cockroachdb/cockroach/pkg/storage.(*Store).processRaftRequest.func4(0xc420e64240, 0x0, 0x0, 0x0)
/go/src/github.com/cockroachdb/cockroach/pkg/storage/store.go:2965 +0x89
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).withRaftGroupLocked(0xc422e07000, 0x1b42800, 0xc425777848, 0xadb54b, 0xc422e070e8)
/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica.go:535 +0x139
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).withRaftGroup(0xc422e07000, 0xc425777848, 0x0, 0x0)
/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica.go:552 +0x92
github.com/cockroachdb/cockroach/pkg/storage.(*Store).processRaftRequest(0xc420904e00, 0x7f07f4ae9c78, 0xc424f982d0, 0xc426c2f500, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
/go/src/github.com/cockroachdb/cockroach/pkg/storage/store.go:2966 +0xc88
github.com/cockroachdb/cockroach/pkg/storage.(*Store).processRequestQueue(0xc420904e00, 0x16)
/go/src/github.com/cockroachdb/cockroach/pkg/storage/store.go:3203 +0x180
github.com/cockroachdb/cockroach/pkg/storage.(*raftScheduler).worker(0xc4200f7ad0, 0xc4203ec5a0)
/go/src/github.com/cockroachdb/cockroach/pkg/storage/scheduler.go:241 +0x267
github.com/cockroachdb/cockroach/pkg/storage.(*raftScheduler).Start.func2()
/go/src/github.com/cockroachdb/cockroach/pkg/storage/scheduler.go:181 +0x33
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunWorker.func1(0xc4203ec5a0, 0xc4207a22a0)
/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:196 +0x7d
created by github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunWorker
/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:197 +0x66
This occured about 100s after starting the block-writer on that node:
$ tail logs/supervisor.log
2016-11-24 08:06:09,595 INFO supervisord started with pid 3527
2016-11-24 08:26:19,644 INFO spawned: 'cockroach' with pid 4670
2016-11-24 08:26:22,230 INFO success: cockroach entered RUNNING state, process has stayed up for > than 2 seconds (startsecs)
2016-11-24 08:26:22,233 INFO spawned: 'goroutine_profile' with pid 4699
2016-11-24 08:26:24,652 INFO success: goroutine_profile entered RUNNING state, process has stayed up for > than 2 seconds (startsecs)
2016-11-24 08:26:24,655 INFO spawned: 'node_exporter' with pid 4721
2016-11-24 08:26:26,886 INFO success: node_exporter entered RUNNING state, process has stayed up for > than 2 seconds (startsecs)
2016-11-24 08:29:38,161 INFO spawned: 'block_writer' with pid 4921
2016-11-24 08:29:40,451 INFO success: block_writer entered RUNNING state, process has stayed up for > than 2 seconds (startsecs)
2016-11-24 08:31:27,364 INFO exited: cockroach (exit status 2; not expected)
Full stderr from this node:
cockroach.stderr.txt
All other nodes are still up and running and processing sql requests.
I'll leave this node down for now and silence the alert. Continuous deployment has not yet been enabled.
Metadata
Metadata
Assignees
Labels
No labels