You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
so, i'm using nsqd's diskqueue with https://github.com/graphite-ng/carbon-relay-ng
i've been talking with @mreiferson and it seems it would make sense to make diskqueue a separate go project, but that's another story.
it has been running for months, i've seen some dataloss but haven't been able to track down what the cause is (and whether it's my code or diskqueue), but anyway, today i saw the first crash.
what happens in carbon-relay-ng (see https://github.com/graphite-ng/carbon-relay-ng/blob/master/routing/routing.go), when it says "skyline connected", is that it gets queue.ReadChan() and starts reading from it. the panic happened "shortly" after this happened, but it's not clear if, or how many, packets have been read from the readchan prior to the crash. however the stat timestamps of the spool files match closely with the last log output.
@Dieterbe looks like this is a similar issue to #469 where we're not validating lengths before making buffers to read data. The easy fix is to add some validation (if you're interested in submitting a PR upstream here).
I'm inclined to believe the data was corrupted in some way (not sure how else the datafile would've otherwise contained an invalid length).
In any event, I'll see if I can validate by grabbing the files and reproducing.
so, i'm using nsqd's diskqueue with https://github.com/graphite-ng/carbon-relay-ng
i've been talking with @mreiferson and it seems it would make sense to make diskqueue a separate go project, but that's another story.
it has been running for months, i've seen some dataloss but haven't been able to track down what the cause is (and whether it's my code or diskqueue), but anyway, today i saw the first crash.
what happens in carbon-relay-ng (see https://github.com/graphite-ng/carbon-relay-ng/blob/master/routing/routing.go), when it says "skyline connected", is that it gets
queue.ReadChan()
and starts reading from it. the panic happened "shortly" after this happened, but it's not clear if, or how many, packets have been read from the readchan prior to the crash. however the stat timestamps of the spool files match closely with the last log output.you can get the files:
http://dieter.plaetinck.be/files/spool_skyline.diskqueue.000001.dat
http://dieter.plaetinck.be/files/spool_skyline.diskqueue.meta.dat
the diskqueue version is taken from nsqd on Aug 4 has been minimally adjusted to make some needed things visible (grafana/carbon-relay-ng@4d6ebb3)
The text was updated successfully, but these errors were encountered: