Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nsqd: diskqueue doesn't validate lengths #471

Closed
Dieterbe opened this issue Oct 6, 2014 · 2 comments
Closed

nsqd: diskqueue doesn't validate lengths #471

Dieterbe opened this issue Oct 6, 2014 · 2 comments

Comments

@Dieterbe
Copy link
Contributor

Dieterbe commented Oct 6, 2014

so, i'm using nsqd's diskqueue with https://github.com/graphite-ng/carbon-relay-ng
i've been talking with @mreiferson and it seems it would make sense to make diskqueue a separate go project, but that's another story.
it has been running for months, i've seen some dataloss but haven't been able to track down what the cause is (and whether it's my code or diskqueue), but anyway, today i saw the first crash.
what happens in carbon-relay-ng (see https://github.com/graphite-ng/carbon-relay-ng/blob/master/routing/routing.go), when it says "skyline connected", is that it gets queue.ReadChan() and starts reading from it. the panic happened "shortly" after this happened, but it's not clear if, or how many, packets have been read from the readchan prior to the crash. however the stat timestamps of the spool files match closely with the last log output.

18:04:50.563656 routing.go:105: skyline (re)connecting to 127.0.0.1:2024
18:04:50.563794 routing.go:119: skyline connected
panic: runtime error: makeslice: len out of range

goroutine 21 [running]:
runtime.panic(0x762780, 0x9ad320)
    /usr/lib/go/src/pkg/runtime/panic.c:279 +0xf5
github.com/graphite-ng/carbon-relay-ng/nsqd.(*DiskQueue).readOne(0xc2080774a0, 0x0, 0x0, 0x0, 0x0, 0x0)
    /home/dieter/go/src/github.com/graphite-ng/carbon-relay-ng/nsqd/diskqueue.go:252 +0x4c4
github.com/graphite-ng/carbon-relay-ng/nsqd.(*DiskQueue).ioLoop(0xc2080774a0)
    /home/dieter/go/src/github.com/graphite-ng/carbon-relay-ng/nsqd/diskqueue.go:553 +0x618
created by github.com/graphite-ng/carbon-relay-ng/nsqd.NewDiskQueue
    /home/dieter/go/src/github.com/graphite-ng/carbon-relay-ng/nsqd/diskqueue.go:87 +0x338

goroutine 16 [chan receive, 16 minutes]:
github.com/rcrowley/goagain.Wait(0x7f8a881e2d18, 0xc20803c1f0, 0x7f8a8803cd88, 0x0, 0x0)
    /home/dieter/go/src/github.com/rcrowley/goagain/goagain.go:204 +0x28f
github.com/rcrowley/goagain.AwaitSignals(0x7f8a881e2d18, 0xc20803c1f0, 0x0, 0x0)
    /home/dieter/go/src/github.com/rcrowley/goagain/legacy.go:13 +0x43
main.main()
    /home/dieter/go/src/github.com/graphite-ng/carbon-relay-ng/carbon-relay-ng.go:415 +0xb21

goroutine 19 [finalizer wait, 17 minutes]:
runtime.park(0x419f10, 0x9b5d10, 0x9b3789)
    /usr/lib/go/src/pkg/runtime/proc.c:1369 +0x89
runtime.parkunlock(0x9b5d10, 0x9b3789)
    /usr/lib/go/src/pkg/runtime/proc.c:1385 +0x3b
runfinq()
    /usr/lib/go/src/pkg/runtime/mgc0.c:2644 +0xcf
runtime.goexit()
    /usr/lib/go/src/pkg/runtime/proc.c:1445

goroutine 20 [syscall, 16 minutes]:
os/signal.loop()
    /usr/lib/go/src/pkg/os/signal/signal_unix.go:21 +0x1e
created by os/signal.init·1
    /usr/lib/go/src/pkg/os/signal/signal_unix.go:27 +0x32

goroutine 22 [runnable]:
github.com/graphite-ng/carbon-relay-ng/routing.(*Route).relay(0xc2080723f0)
    /home/dieter/go/src/github.com/graphite-ng/carbon-relay-ng/routing/routing.go:219 +0x413
created by github.com/graphite-ng/carbon-relay-ng/routing.(*Route).Run
    /home/dieter/go/src/github.com/graphite-ng/carbon-relay-ng/routing/routing.go:92 +0x1a2

goroutine 23 [select]:
github.com/graphite-ng/carbon-relay-ng/routing.(*Route).relay(0xc208072480)
    /home/dieter/go/src/github.com/graphite-ng/carbon-relay-ng/routing/routing.go:219 +0x413
created by github.com/graphite-ng/carbon-relay-ng/routing.(*Route).Run
    /home/dieter/go/src/github.com/graphite-ng/carbon-relay-ng/routing/routing.go:92 +0x1a2

goroutine 17 [syscall, 16 minutes]:
runtime.goexit()
    /usr/lib/go/src/pkg/runtime/proc.c:1445

goroutine 27 [IO wait]:
net.runtime_pollWait(0x7f8a881e2988, 0x72, 0x0)
    /usr/lib/go/src/pkg/runtime/netpoll.goc:146 +0x66
net.(*pollDesc).Wait(0xc208028610, 0x72, 0x0, 0x0)
    /usr/lib/go/src/pkg/net/fd_poll_runtime.go:84 +0x46
net.(*pollDesc).WaitRead(0xc208028610, 0x0, 0x0)
    /usr/lib/go/src/pkg/net/fd_poll_runtime.go:89 +0x42
net.(*netFD).accept(0xc2080285b0, 0x868d18, 0x0, 0x7f8a881e1418, 0xb)
    /usr/lib/go/src/pkg/net/fd_unix.go:419 +0x343
net.(*TCPListener).AcceptTCP(0xc20803c1f0, 0xc208038910, 0x0, 0x0)
    /usr/lib/go/src/pkg/net/tcpsock_posix.go:234 +0x5d
main.accept(0xc20803c1f0, 0xc208038910, 0xc, 0xc2080389b0, 0xc, 0xc208038a50, 0xc, 0xc208038b50, 0x7, 0x0, ...)
    /home/dieter/go/src/github.com/graphite-ng/carbon-relay-ng/carbon-relay-ng.go:65 +0x37
created by main.main
    /home/dieter/go/src/github.com/graphite-ng/carbon-relay-ng/carbon-relay-ng.go:395 +0xab8

goroutine 28 [IO wait, 16 minutes]:
net.runtime_pollWait(0x7f8a881e28d8, 0x72, 0x0)
    /usr/lib/go/src/pkg/runtime/netpoll.goc:146 +0x66
net.(*pollDesc).Wait(0xc208028680, 0x72, 0x0, 0x0)
    /usr/lib/go/src/pkg/net/fd_poll_runtime.go:84 +0x46
net.(*pollDesc).WaitRead(0xc208028680, 0x0, 0x0)
    /usr/lib/go/src/pkg/net/fd_poll_runtime.go:89 +0x42
net.(*netFD).accept(0xc208028620, 0x868d18, 0x0, 0x7f8a881e1418, 0xb)
    /usr/lib/go/src/pkg/net/fd_unix.go:419 +0x343
net.(*TCPListener).AcceptTCP(0xc20803c1f8, 0xc20803e0a8, 0x0, 0x0)
    /usr/lib/go/src/pkg/net/tcpsock_posix.go:234 +0x5d
net.(*TCPListener).Accept(0xc20803c1f8, 0x0, 0x0, 0x0, 0x0)
    /usr/lib/go/src/pkg/net/tcpsock_posix.go:244 +0x4b
github.com/graphite-ng/carbon-relay-ng/admin.ListenAndServe(0xc2080389b0, 0xc, 0x0, 0x0)
    /home/dieter/go/src/github.com/graphite-ng/carbon-relay-ng/admin/admin.go:49 +0xea
main.adminListener()
    /home/dieter/go/src/github.com/graphite-ng/carbon-relay-ng/carbon-relay-ng.go:234 +0x6ff
created by main.main
    /home/dieter/go/src/github.com/graphite-ng/carbon-relay-ng/carbon-relay-ng.go:406 +0xad5

goroutine 29 [IO wait, 16 minutes]:
net.runtime_pollWait(0x7f8a881e2828, 0x72, 0x0)
    /usr/lib/go/src/pkg/runtime/netpoll.goc:146 +0x66
net.(*pollDesc).Wait(0xc2080286f0, 0x72, 0x0, 0x0)
    /usr/lib/go/src/pkg/net/fd_poll_runtime.go:84 +0x46
net.(*pollDesc).WaitRead(0xc2080286f0, 0x0, 0x0)
    /usr/lib/go/src/pkg/net/fd_poll_runtime.go:89 +0x42
net.(*netFD).accept(0xc208028690, 0x868d18, 0x0, 0x7f8a881e1418, 0xb)
    /usr/lib/go/src/pkg/net/fd_unix.go:419 +0x343
net.(*TCPListener).AcceptTCP(0xc20803c228, 0x7f8a88048d58, 0x0, 0x0)
    /usr/lib/go/src/pkg/net/tcpsock_posix.go:234 +0x5d
net/http.tcpKeepAliveListener.Accept(0xc20803c228, 0x0, 0x0, 0x0, 0x0)
    /usr/lib/go/src/pkg/net/http/server.go:1947 +0x4b
net/http.(*Server).Serve(0xc208004de0, 0x7f8a881e2e18, 0xc20803c228, 0x0, 0x0)
    /usr/lib/go/src/pkg/net/http/server.go:1698 +0x91
net/http.(*Server).ListenAndServe(0xc208004de0, 0x0, 0x0)
    /usr/lib/go/src/pkg/net/http/server.go:1688 +0x14d
net/http.ListenAndServe(0xc208038a50, 0xc, 0x0, 0x0, 0x0, 0x0)
    /usr/lib/go/src/pkg/net/http/server.go:1778 +0x79
main.httpListener()
    /home/dieter/go/src/github.com/graphite-ng/carbon-relay-ng/carbon-relay-ng.go:326 +0x1eb
created by main.main
    /home/dieter/go/src/github.com/graphite-ng/carbon-relay-ng/carbon-relay-ng.go:410 +0xaf2

goroutine 30 [chan receive]:
main.Router()
    /home/dieter/go/src/github.com/graphite-ng/carbon-relay-ng/carbon-relay-ng.go:107 +0x76
created by main.main
    /home/dieter/go/src/github.com/graphite-ng/carbon-relay-ng/carbon-relay-ng.go:413 +0xb01

~/g/s/g/g/c/spool ❯❯❯ stat spool_skyline.diskqueue.meta.dat
  File: ‘spool_skyline.diskqueue.meta.dat’
  Size: 19          Blocks: 8          IO Block: 4096   regular file
Device: fe02h/65026d    Inode: 2758870     Links: 1
Access: (0600/-rw-------)  Uid: ( 1011/  dieter)   Gid: ( 1000/  dieter)
Access: 2014-10-06 18:04:50.563211947 -0400
Modify: 2014-10-06 18:04:50.563211947 -0400
Change: 2014-10-06 18:04:50.563211947 -0400
 Birth: -
~/g/s/g/g/c/spool ❯❯❯ stat spool_skyline.diskqueue.000001.dat
  File: ‘spool_skyline.diskqueue.000001.dat’
  Size: 34072       Blocks: 80         IO Block: 4096   regular file
Device: fe02h/65026d    Inode: 2756393     Links: 1
Access: (0600/-rw-------)  Uid: ( 1011/  dieter)   Gid: ( 1000/  dieter)
Access: 2014-10-06 17:32:16.421470496 -0400
Modify: 2014-10-06 18:04:50.323212567 -0400
Change: 2014-10-06 18:04:50.323212567 -0400
 Birth: -

you can get the files:
http://dieter.plaetinck.be/files/spool_skyline.diskqueue.000001.dat
http://dieter.plaetinck.be/files/spool_skyline.diskqueue.meta.dat

the diskqueue version is taken from nsqd on Aug 4 has been minimally adjusted to make some needed things visible (grafana/carbon-relay-ng@4d6ebb3)

@mreiferson
Copy link
Member

@Dieterbe looks like this is a similar issue to #469 where we're not validating lengths before making buffers to read data. The easy fix is to add some validation (if you're interested in submitting a PR upstream here).

I'm inclined to believe the data was corrupted in some way (not sure how else the datafile would've otherwise contained an invalid length).

In any event, I'll see if I can validate by grabbing the files and reproducing.

Thanks for the detailed report.

@mreiferson mreiferson added the bug label Oct 7, 2014
@mreiferson mreiferson changed the title diskqueue readOne makeslice error nsqd: diskqueue doesn't validate lengths Oct 7, 2014
@mreiferson
Copy link
Member

this was fixed in #600

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants