Closed
Description
Hello
Using influxdb 1.2.0 we are hitting the below panic occasionally.
Mar 2 11:32:52 localhost influxd[3030]: [I] 2017-03-02T10:32:52Z SELECT * FROM telegraf.autogen.consul LIMIT 1 service=query
Mar 2 11:32:52 localhost influxd[3030]: panic: interface conversion: tsm1.Value is tsm1.IntegerValue, not tsm1.FloatValue
Mar 2 11:32:52 localhost influxd[3030]: goroutine 4191486 [running]:
Mar 2 11:32:52 localhost influxd[3030]: panic(0xa081a0, 0xc428601a40)
Mar 2 11:32:52 localhost influxd[3030]: /usr/local/go/src/runtime/panic.go:500 +0x1a1
Mar 2 11:32:52 localhost influxd[3030]: github.com/influxdata/influxdb/tsdb/engine/tsm1.(*floatAscendingCursor).peekCache(0xc42db2c7e0, 0xc429e520c8, 0xc421c43b88)
Mar 2 11:32:52 localhost influxd[3030]: /root/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/iterator.gen.go:344 +0xab
Mar 2 11:32:52 localhost influxd[3030]: github.com/influxdata/influxdb/tsdb/engine/tsm1.(*floatAscendingCursor).nextFloat(0xc42db2c7e0, 0xc421c43bc8, 0xc429e520c8)
Mar 2 11:32:52 localhost influxd[3030]: /root/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/iterator.gen.go:372 +0x2f
Mar 2 11:32:52 localhost influxd[3030]: github.com/influxdata/influxdb/tsdb/engine/tsm1.(*floatAscendingCursor).next(0xc42db2c7e0, 0x8000000000000000, 0x9cc480, 0xc429e520c8)
Mar 2 11:32:52 localhost influxd[3030]: /root/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/iterator.gen.go:368 +0x2b
Mar 2 11:32:52 localhost influxd[3030]: github.com/influxdata/influxdb/tsdb/engine/tsm1.(*bufCursor).next(0xc43a3f5cc0, 0x8000000000000000, 0x9cc480, 0xc429e520c8)
Mar 2 11:32:52 localhost influxd[3030]: /root/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/iterator.gen.go:64 +0x3b
Mar 2 11:32:52 localhost influxd[3030]: github.com/influxdata/influxdb/tsdb/engine/tsm1.(*bufCursor).peek(0xc43a3f5cc0, 0x8000000000000000, 0x9cc480, 0xc429e520c8)
Mar 2 11:32:52 localhost influxd[3030]: /root/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/iterator.gen.go:75 +0x2f
Mar 2 11:32:52 localhost influxd[3030]: github.com/influxdata/influxdb/tsdb/engine/tsm1.(*floatIterator).Next(0xc4298ac3c0, 0xc4298ac328, 0x0, 0x0)
Mar 2 11:32:52 localhost influxd[3030]: /root/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/iterator.gen.go:175 +0x8d
Mar 2 11:32:52 localhost influxd[3030]: github.com/influxdata/influxdb/influxql.(*floatSortedMergeIterator).pop(0xc42a33f050, 0x456e30, 0xc425a765e8, 0xc425a765f0)
Mar 2 11:32:52 localhost influxd[3030]: /root/go/src/github.com/influxdata/influxdb/influxql/iterator.gen.go:362 +0xfe
Mar 2 11:32:52 localhost influxd[3030]: github.com/influxdata/influxdb/influxql.(*floatSortedMergeIterator).Next(0xc42a33f050, 0xc425a76760, 0x8a1e9d, 0xc428012120)
Mar 2 11:32:52 localhost influxd[3030]: /root/go/src/github.com/influxdata/influxdb/influxql/iterator.gen.go:351 +0x2b
Mar 2 11:32:52 localhost influxd[3030]: github.com/influxdata/influxdb/influxql.(*floatInterruptIterator).Next(0xc431728d40, 0xb, 0x24, 0xc43ae56550)
Mar 2 11:32:52 localhost influxd[3030]: /root/go/src/github.com/influxdata/influxdb/influxql/iterator.gen.go:777 +0x52
Mar 2 11:32:52 localhost influxd[3030]: github.com/influxdata/influxdb/tsdb/engine/tsm1.(*floatLimitIterator).Next(0xc423571100, 0x0, 0x0, 0xc4309a69a0)
Mar 2 11:32:52 localhost influxd[3030]: /root/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/iterator.gen.go:280 +0x54
Mar 2 11:32:52 localhost influxd[3030]: github.com/influxdata/influxdb/influxql.(*floatInterruptIterator).Next(0xc431728d80, 0x0, 0x0, 0x0)
Mar 2 11:32:52 localhost influxd[3030]: /root/go/src/github.com/influxdata/influxdb/influxql/iterator.gen.go:777 +0x52
Mar 2 11:32:52 localhost influxd[3030]: github.com/influxdata/influxdb/influxql.(*floatInterruptIterator).Next(0xc431728dc0, 0x0, 0xc424e36f00, 0x0)
Mar 2 11:32:52 localhost influxd[3030]: /root/go/src/github.com/influxdata/influxdb/influxql/iterator.gen.go:777 +0x52
Mar 2 11:32:52 localhost influxd[3030]: github.com/influxdata/influxdb/influxql.(*floatInterruptIterator).Next(0xc431728e00, 0x0, 0x8000000000000001, 0x7ffffffffffffffe)
Mar 2 11:32:52 localhost influxd[3030]: /root/go/src/github.com/influxdata/influxdb/influxql/iterator.gen.go:777 +0x52
Mar 2 11:32:52 localhost influxd[3030]: github.com/influxdata/influxdb/influxql.(*floatLimitIterator).Next(0xc424e91900, 0xc42b35e960, 0x180001, 0x0)
Mar 2 11:32:52 localhost influxd[3030]: /root/go/src/github.com/influxdata/influxdb/influxql/iterator.gen.go:534 +0x37
Mar 2 11:32:52 localhost influxd[3030]: github.com/influxdata/influxdb/influxql.(*bufFloatIterator).Next(0xc431728e40, 0x268, 0x180001, 0x0)
Mar 2 11:32:52 localhost influxd[3030]: /root/go/src/github.com/influxdata/influxdb/influxql/iterator.gen.go:95 +0x3c
Mar 2 11:32:52 localhost influxd[3030]: github.com/influxdata/influxdb/influxql.(*floatAuxIterator).stream(0xc431728ec0)
Mar 2 11:32:52 localhost influxd[3030]: /root/go/src/github.com/influxdata/influxdb/influxql/iterator.gen.go:874 +0x32
Mar 2 11:32:52 localhost influxd[3030]: created by github.com/influxdata/influxdb/influxql.(*floatAuxIterator).Start
Mar 2 11:32:52 localhost influxd[3030]: /root/go/src/github.com/influxdata/influxdb/influxql/iterator.gen.go:860 +0x3f
Not able to see any duplicate report open for this. Please tell me if this is fixed in 1.2.1 or is it a new bug.
Regards
Ranjith
Metadata
Metadata
Assignees
Type
Projects
Relationships
Development
No branches or pull requests
Activity
jwilder commentedon Mar 2, 2017
@ranjithruban Does your version of telegraf include this change? influxdata/telegraf#2277
Can you show the output of
SELECT * FROM telegraf.autogen.consul LIMIT 1
?jwilder commentedon Mar 2, 2017
Also, can you attach the output of
show shards
andSHOW FIELD KEYS
?ranjithruban commentedon Mar 2, 2017
@jwilder no , we are using telegraf 1.2.0 , this data was from consul telemetry plugin sending to statsd with some statsd templates added. Also i have another panic with same traces on another metric below.
http://pastebin.com/EWearfiB
Had to drop the shard and move wal to recover influxdb from this error. Once it paniched influxdb was not able to recover properly with below error.
Mar 2 11:32:55 localhost influxd[19008]: [I] 2017-03-02T10:32:55Z Failed to open shard: 173: [shard 173] field type conflict service=store
I have the corrupted shard 173 saved if it helps in debugging.
show shards
132 telegraf autogen 132 2017-02-11T00:00:00Z 2017-02-12T00:00:00Z 2019-02-12T00:00:00Z
134 telegraf autogen 134 2017-02-12T00:00:00Z 2017-02-13T00:00:00Z 2019-02-13T00:00:00Z
136 telegraf autogen 136 2017-02-13T00:00:00Z 2017-02-14T00:00:00Z 2019-02-14T00:00:00Z
138 telegraf autogen 138 2017-02-14T00:00:00Z 2017-02-15T00:00:00Z 2019-02-15T00:00:00Z
140 telegraf autogen 140 2017-02-15T00:00:00Z 2017-02-16T00:00:00Z 2019-02-16T00:00:00Z
142 telegraf autogen 142 2017-02-16T00:00:00Z 2017-02-17T00:00:00Z 2019-02-17T00:00:00Z
144 telegraf autogen 144 2017-02-17T00:00:00Z 2017-02-18T00:00:00Z 2019-02-18T00:00:00Z
146 telegraf autogen 146 2017-02-18T00:00:00Z 2017-02-19T00:00:00Z 2019-02-19T00:00:00Z
148 telegraf autogen 148 2017-02-19T00:00:00Z 2017-02-20T00:00:00Z 2019-02-20T00:00:00Z
150 telegraf autogen 150 2017-02-20T00:00:00Z 2017-02-21T00:00:00Z 2019-02-21T00:00:00Z
152 telegraf autogen 152 2017-02-21T00:00:00Z 2017-02-22T00:00:00Z 2019-02-22T00:00:00Z
154 telegraf autogen 154 2017-02-22T00:00:00Z 2017-02-23T00:00:00Z 2019-02-23T00:00:00Z
159 telegraf autogen 159 2017-02-23T00:00:00Z 2017-02-24T00:00:00Z 2019-02-24T00:00:00Z
161 telegraf autogen 161 2017-02-24T00:00:00Z 2017-02-25T00:00:00Z 2019-02-25T00:00:00Z
163 telegraf autogen 163 2017-02-25T00:00:00Z 2017-02-26T00:00:00Z 2019-02-26T00:00:00Z
165 telegraf autogen 165 2017-02-26T00:00:00Z 2017-02-27T00:00:00Z 2019-02-27T00:00:00Z
167 telegraf autogen 167 2017-02-27T00:00:00Z 2017-02-28T00:00:00Z 2019-02-28T00:00:00Z
169 telegraf autogen 169 2017-02-28T00:00:00Z 2017-03-01T00:00:00Z 2019-03-01T00:00:00Z
171 telegraf autogen 171 2017-03-01T00:00:00Z 2017-03-02T00:00:00Z 2019-03-02T00:00:00Z
175 telegraf autogen 175 2017-03-02T00:00:00Z 2017-03-03T00:00:00Z 2019-03-03T00:00:00Z
name: test
id database retention_policy shard_group start_time end_time expiry_time owners
jwilder commentedon Mar 2, 2017
Can attach shard 173?
ranjithruban commentedon Mar 2, 2017
Added tsm files. 47mb file. Please see if you can download this.
https://www.dropbox.com/s/r7x1xmpfda4z1nu/173.tar.gz?dl=0
jwilder commentedon Mar 2, 2017
@ranjithruban Got it. Thanks.
jwilder commentedon Mar 2, 2017
@ranjithruban It looks like the problem is the
consul
measurement andvalue
field. You have some series with it stored as afloat64
and one as anint64
. They are different series so there is likely a race in the code that ensures the type is consistent within a shard. You should get afield type conflict
during the write and the point would be dropped, but it looks like the writes are being allowed which causes the panic at query time and the shard to fail to load at startupranjithruban commentedon Mar 2, 2017
@jwilder thanks. I have seen the "Field type conflict, dropping conflicted points: dropping" in telegraf in some of the measurements we use but not for consul/ or for the custom application measurement in second panic. Not really sure why it allowed write in some case.
jwilder commentedon Mar 2, 2017
@ranjithruban I would also check your client to ensure that whatever is writing to
consul
measurement andvalue
field always uses the correct formatting for types.float64
should have a decimal andint64
need a trailingi
.I can attach the problem series keys if that would help.
ranjithruban commentedon Mar 2, 2017
Yes i will check that. Please attach if it. To be clear can this bug be fixed in a way that that write are not allowed even if client send it ?. In our case multiple measurements are sending such values and some are java spring metrics.
jwilder commentedon Mar 2, 2017
@ranjithruban There is a bug in the database in that it allowed two different field types for the same measurement. We'll need to fix that to prevent the panic and the shard failing to load. Regardless, writing data with different fields is not valid. You will end up with data being dropped or write errors when this is fixed as the database cannot support different field types for the same measurement. You'll need to use different field names, different measurements or ensure they all write the same field type.
ranjithruban commentedon Mar 3, 2017
@jwilder Great, thank you. 👍
jwilder commentedon Mar 3, 2017
@ranjithruban Would you be able to test out #8092 to see if it prevents you shards from getting into an inconsistent state? We haven't been able to reproduce the issue yet.
ranjithruban commentedon Mar 3, 2017
@jwilder Yes i will test it and update the results.
jwilder commentedon Mar 7, 2017
Fixed via #8092 #8104 #8085