Skip to content

[1.2.0] panic: interface conversion: tsm1.Value is tsm1.IntegerValue, not tsm1.FloatValue #8085

Closed
@ranjithruban

Description

@ranjithruban

Hello

Using influxdb 1.2.0 we are hitting the below panic occasionally.

Mar  2 11:32:52 localhost influxd[3030]: [I] 2017-03-02T10:32:52Z SELECT * FROM telegraf.autogen.consul LIMIT 1 service=query
Mar  2 11:32:52 localhost influxd[3030]: panic: interface conversion: tsm1.Value is tsm1.IntegerValue, not tsm1.FloatValue
Mar  2 11:32:52 localhost influxd[3030]: goroutine 4191486 [running]:
Mar  2 11:32:52 localhost influxd[3030]: panic(0xa081a0, 0xc428601a40)
Mar  2 11:32:52 localhost influxd[3030]: /usr/local/go/src/runtime/panic.go:500 +0x1a1
Mar  2 11:32:52 localhost influxd[3030]: github.com/influxdata/influxdb/tsdb/engine/tsm1.(*floatAscendingCursor).peekCache(0xc42db2c7e0, 0xc429e520c8, 0xc421c43b88)
Mar  2 11:32:52 localhost influxd[3030]: /root/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/iterator.gen.go:344 +0xab
Mar  2 11:32:52 localhost influxd[3030]: github.com/influxdata/influxdb/tsdb/engine/tsm1.(*floatAscendingCursor).nextFloat(0xc42db2c7e0, 0xc421c43bc8, 0xc429e520c8)
Mar  2 11:32:52 localhost influxd[3030]: /root/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/iterator.gen.go:372 +0x2f
Mar  2 11:32:52 localhost influxd[3030]: github.com/influxdata/influxdb/tsdb/engine/tsm1.(*floatAscendingCursor).next(0xc42db2c7e0, 0x8000000000000000, 0x9cc480, 0xc429e520c8)
Mar  2 11:32:52 localhost influxd[3030]: /root/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/iterator.gen.go:368 +0x2b
Mar  2 11:32:52 localhost influxd[3030]: github.com/influxdata/influxdb/tsdb/engine/tsm1.(*bufCursor).next(0xc43a3f5cc0, 0x8000000000000000, 0x9cc480, 0xc429e520c8)
Mar  2 11:32:52 localhost influxd[3030]: /root/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/iterator.gen.go:64 +0x3b
Mar  2 11:32:52 localhost influxd[3030]: github.com/influxdata/influxdb/tsdb/engine/tsm1.(*bufCursor).peek(0xc43a3f5cc0, 0x8000000000000000, 0x9cc480, 0xc429e520c8)
Mar  2 11:32:52 localhost influxd[3030]: /root/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/iterator.gen.go:75 +0x2f
Mar  2 11:32:52 localhost influxd[3030]: github.com/influxdata/influxdb/tsdb/engine/tsm1.(*floatIterator).Next(0xc4298ac3c0, 0xc4298ac328, 0x0, 0x0)
Mar  2 11:32:52 localhost influxd[3030]: /root/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/iterator.gen.go:175 +0x8d
Mar  2 11:32:52 localhost influxd[3030]: github.com/influxdata/influxdb/influxql.(*floatSortedMergeIterator).pop(0xc42a33f050, 0x456e30, 0xc425a765e8, 0xc425a765f0)
Mar  2 11:32:52 localhost influxd[3030]: /root/go/src/github.com/influxdata/influxdb/influxql/iterator.gen.go:362 +0xfe
Mar  2 11:32:52 localhost influxd[3030]: github.com/influxdata/influxdb/influxql.(*floatSortedMergeIterator).Next(0xc42a33f050, 0xc425a76760, 0x8a1e9d, 0xc428012120)
Mar  2 11:32:52 localhost influxd[3030]: /root/go/src/github.com/influxdata/influxdb/influxql/iterator.gen.go:351 +0x2b
Mar  2 11:32:52 localhost influxd[3030]: github.com/influxdata/influxdb/influxql.(*floatInterruptIterator).Next(0xc431728d40, 0xb, 0x24, 0xc43ae56550)
Mar  2 11:32:52 localhost influxd[3030]: /root/go/src/github.com/influxdata/influxdb/influxql/iterator.gen.go:777 +0x52
Mar  2 11:32:52 localhost influxd[3030]: github.com/influxdata/influxdb/tsdb/engine/tsm1.(*floatLimitIterator).Next(0xc423571100, 0x0, 0x0, 0xc4309a69a0)
Mar  2 11:32:52 localhost influxd[3030]: /root/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/iterator.gen.go:280 +0x54
Mar  2 11:32:52 localhost influxd[3030]: github.com/influxdata/influxdb/influxql.(*floatInterruptIterator).Next(0xc431728d80, 0x0, 0x0, 0x0)
Mar  2 11:32:52 localhost influxd[3030]: /root/go/src/github.com/influxdata/influxdb/influxql/iterator.gen.go:777 +0x52
Mar  2 11:32:52 localhost influxd[3030]: github.com/influxdata/influxdb/influxql.(*floatInterruptIterator).Next(0xc431728dc0, 0x0, 0xc424e36f00, 0x0)
Mar  2 11:32:52 localhost influxd[3030]: /root/go/src/github.com/influxdata/influxdb/influxql/iterator.gen.go:777 +0x52
Mar  2 11:32:52 localhost influxd[3030]: github.com/influxdata/influxdb/influxql.(*floatInterruptIterator).Next(0xc431728e00, 0x0, 0x8000000000000001, 0x7ffffffffffffffe)
Mar  2 11:32:52 localhost influxd[3030]: /root/go/src/github.com/influxdata/influxdb/influxql/iterator.gen.go:777 +0x52
Mar  2 11:32:52 localhost influxd[3030]: github.com/influxdata/influxdb/influxql.(*floatLimitIterator).Next(0xc424e91900, 0xc42b35e960, 0x180001, 0x0)
Mar  2 11:32:52 localhost influxd[3030]: /root/go/src/github.com/influxdata/influxdb/influxql/iterator.gen.go:534 +0x37
Mar  2 11:32:52 localhost influxd[3030]: github.com/influxdata/influxdb/influxql.(*bufFloatIterator).Next(0xc431728e40, 0x268, 0x180001, 0x0)
Mar  2 11:32:52 localhost influxd[3030]: /root/go/src/github.com/influxdata/influxdb/influxql/iterator.gen.go:95 +0x3c
Mar  2 11:32:52 localhost influxd[3030]: github.com/influxdata/influxdb/influxql.(*floatAuxIterator).stream(0xc431728ec0)
Mar  2 11:32:52 localhost influxd[3030]: /root/go/src/github.com/influxdata/influxdb/influxql/iterator.gen.go:874 +0x32
Mar  2 11:32:52 localhost influxd[3030]: created by github.com/influxdata/influxdb/influxql.(*floatAuxIterator).Start
Mar  2 11:32:52 localhost influxd[3030]: /root/go/src/github.com/influxdata/influxdb/influxql/iterator.gen.go:860 +0x3f

Not able to see any duplicate report open for this. Please tell me if this is fixed in 1.2.1 or is it a new bug.

Regards
Ranjith

Activity

jwilder

jwilder commented on Mar 2, 2017

@jwilder
Contributor

@ranjithruban Does your version of telegraf include this change? influxdata/telegraf#2277

Can you show the output of SELECT * FROM telegraf.autogen.consul LIMIT 1?

jwilder

jwilder commented on Mar 2, 2017

@jwilder
Contributor

Also, can you attach the output of show shards and SHOW FIELD KEYS?

ranjithruban

ranjithruban commented on Mar 2, 2017

@ranjithruban
Author

@jwilder no , we are using telegraf 1.2.0 , this data was from consul telemetry plugin sending to statsd with some statsd templates added. Also i have another panic with same traces on another metric below.

http://pastebin.com/EWearfiB

Had to drop the shard and move wal to recover influxdb from this error. Once it paniched influxdb was not able to recover properly with below error.

Mar 2 11:32:55 localhost influxd[19008]: [I] 2017-03-02T10:32:55Z Failed to open shard: 173: [shard 173] field type conflict service=store

I have the corrupted shard 173 saved if it helps in debugging.

show shards

132 telegraf autogen 132 2017-02-11T00:00:00Z 2017-02-12T00:00:00Z 2019-02-12T00:00:00Z
134 telegraf autogen 134 2017-02-12T00:00:00Z 2017-02-13T00:00:00Z 2019-02-13T00:00:00Z
136 telegraf autogen 136 2017-02-13T00:00:00Z 2017-02-14T00:00:00Z 2019-02-14T00:00:00Z
138 telegraf autogen 138 2017-02-14T00:00:00Z 2017-02-15T00:00:00Z 2019-02-15T00:00:00Z
140 telegraf autogen 140 2017-02-15T00:00:00Z 2017-02-16T00:00:00Z 2019-02-16T00:00:00Z
142 telegraf autogen 142 2017-02-16T00:00:00Z 2017-02-17T00:00:00Z 2019-02-17T00:00:00Z
144 telegraf autogen 144 2017-02-17T00:00:00Z 2017-02-18T00:00:00Z 2019-02-18T00:00:00Z
146 telegraf autogen 146 2017-02-18T00:00:00Z 2017-02-19T00:00:00Z 2019-02-19T00:00:00Z
148 telegraf autogen 148 2017-02-19T00:00:00Z 2017-02-20T00:00:00Z 2019-02-20T00:00:00Z
150 telegraf autogen 150 2017-02-20T00:00:00Z 2017-02-21T00:00:00Z 2019-02-21T00:00:00Z
152 telegraf autogen 152 2017-02-21T00:00:00Z 2017-02-22T00:00:00Z 2019-02-22T00:00:00Z
154 telegraf autogen 154 2017-02-22T00:00:00Z 2017-02-23T00:00:00Z 2019-02-23T00:00:00Z
159 telegraf autogen 159 2017-02-23T00:00:00Z 2017-02-24T00:00:00Z 2019-02-24T00:00:00Z
161 telegraf autogen 161 2017-02-24T00:00:00Z 2017-02-25T00:00:00Z 2019-02-25T00:00:00Z
163 telegraf autogen 163 2017-02-25T00:00:00Z 2017-02-26T00:00:00Z 2019-02-26T00:00:00Z
165 telegraf autogen 165 2017-02-26T00:00:00Z 2017-02-27T00:00:00Z 2019-02-27T00:00:00Z
167 telegraf autogen 167 2017-02-27T00:00:00Z 2017-02-28T00:00:00Z 2019-02-28T00:00:00Z
169 telegraf autogen 169 2017-02-28T00:00:00Z 2017-03-01T00:00:00Z 2019-03-01T00:00:00Z
171 telegraf autogen 171 2017-03-01T00:00:00Z 2017-03-02T00:00:00Z 2019-03-02T00:00:00Z
175 telegraf autogen 175 2017-03-02T00:00:00Z 2017-03-03T00:00:00Z 2019-03-03T00:00:00Z

name: test
id database retention_policy shard_group start_time end_time expiry_time owners


jwilder

jwilder commented on Mar 2, 2017

@jwilder
Contributor

Can attach shard 173?

ranjithruban

ranjithruban commented on Mar 2, 2017

@ranjithruban
Author

Added tsm files. 47mb file. Please see if you can download this.
https://www.dropbox.com/s/r7x1xmpfda4z1nu/173.tar.gz?dl=0

jwilder

jwilder commented on Mar 2, 2017

@jwilder
Contributor

@ranjithruban Got it. Thanks.

jwilder

jwilder commented on Mar 2, 2017

@jwilder
Contributor

@ranjithruban It looks like the problem is the consul measurement and value field. You have some series with it stored as a float64 and one as an int64. They are different series so there is likely a race in the code that ensures the type is consistent within a shard. You should get a field type conflict during the write and the point would be dropped, but it looks like the writes are being allowed which causes the panic at query time and the shard to fail to load at startup

ranjithruban

ranjithruban commented on Mar 2, 2017

@ranjithruban
Author

@jwilder thanks. I have seen the "Field type conflict, dropping conflicted points: dropping" in telegraf in some of the measurements we use but not for consul/ or for the custom application measurement in second panic. Not really sure why it allowed write in some case.

jwilder

jwilder commented on Mar 2, 2017

@jwilder
Contributor

@ranjithruban I would also check your client to ensure that whatever is writing to consul measurement and value field always uses the correct formatting for types. float64 should have a decimal and int64 need a trailing i.

I can attach the problem series keys if that would help.

ranjithruban

ranjithruban commented on Mar 2, 2017

@ranjithruban
Author

Yes i will check that. Please attach if it. To be clear can this bug be fixed in a way that that write are not allowed even if client send it ?. In our case multiple measurements are sending such values and some are java spring metrics.

jwilder

jwilder commented on Mar 2, 2017

@jwilder
Contributor

@ranjithruban There is a bug in the database in that it allowed two different field types for the same measurement. We'll need to fix that to prevent the panic and the shard failing to load. Regardless, writing data with different fields is not valid. You will end up with data being dropped or write errors when this is fixed as the database cannot support different field types for the same measurement. You'll need to use different field names, different measurements or ensure they all write the same field type.

ranjithruban

ranjithruban commented on Mar 3, 2017

@ranjithruban
Author

@jwilder Great, thank you. 👍

jwilder

jwilder commented on Mar 3, 2017

@jwilder
Contributor

@ranjithruban Would you be able to test out #8092 to see if it prevents you shards from getting into an inconsistent state? We haven't been able to reproduce the issue yet.

ranjithruban

ranjithruban commented on Mar 3, 2017

@ranjithruban
Author

@jwilder Yes i will test it and update the results.

added this to the 1.2.1 milestone on Mar 7, 2017
jwilder

jwilder commented on Mar 7, 2017

@jwilder
Contributor

Fixed via #8092 #8104 #8085

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

      Participants

      @jwilder@ranjithruban@timhallinflux

      Issue actions

        [1.2.0] panic: interface conversion: tsm1.Value is tsm1.IntegerValue, not tsm1.FloatValue · Issue #8085 · influxdata/influxdb