Description
Directions
GitHub Issues are reserved for actionable bug reports and feature requests.
General questions should be sent to the InfluxDB Community Site.
Before opening an issue, search for similar bug reports or feature requests on GitHub Issues.
_If no similar issue can be found, fill out either the "Bug Report" or the "Feature Request" section below.
Erase the other section and everything on and above this line.
Bug report
System info: [Include InfluxDB version, operating system name, and other relevant details]
Influx 1.5.0 or 1.5.2
Kubernetes pod deployment with docker image 1.5.0 or 1.5.2
Steps to reproduce:
- Start Influx
- Starts loading all TSM files
- Influx fails to start and i see an exception
Expected behavior: [What you expected to happen]
Influx start successful
Actual behavior: Influx failed to start
Additional info: [Include gist of relevant config, logs, etc.]
2018-05-28T15:16:20.023366000Z ts=2018-05-28T15:16:20.022704Z lvl=info msg="Opened file" log_id=08LpzZf0000 engine=tsm1 service=filestore path=/var/lib/influxdb/data/telegraf_1080d/telegraf_1080d/650/000000022-000000002.tsm id=0 duration=997.997ms
2018-05-28T15:16:20.068584000Z ts=2018-05-28T15:16:20.068008Z lvl=info msg="Opened file" log_id=08LpzZf0000 engine=tsm1 service=filestore path=/var/lib/influxdb/data/telegraf_1080d/telegraf_1080d/523/000000012-000000002.tsm id=0 duration=1024.809ms
2018-05-28T15:16:20.587784000Z ts=2018-05-28T15:16:20.587112Z lvl=info msg="Open store (end)" log_id=08LpzZf0000 service=store trace_id=08Lpz_3l000 op_name=tsdb_open op_event=end op_elapsed=5083.803ms
2018-05-28T15:16:20.590761000Z panic: unreachable
2018-05-28T15:16:20.591211000Z goroutine 1 [running]:
2018-05-28T15:16:20.591664000Z github.com/influxdata/influxdb/tsdb.(*SeriesIndex).execEntry(0xc4201ef680, 0x290225f, 0x61695f6e616d6573, 0x4016fe085, 0x0, 0x0, 0x0)
2018-05-28T15:16:20.591885000Z /go/src/github.com/influxdata/influxdb/tsdb/series_index.go:179 +0x167
2018-05-28T15:16:20.592092000Z github.com/influxdata/influxdb/tsdb.(*SeriesIndex).Recover.func1(0x7f604f2c005f, 0x61695f6e616d6573, 0x4016fe085, 0x0, 0x0, 0x0, 0x0, 0x0)
2018-05-28T15:16:20.592302000Z /go/src/github.com/influxdata/influxdb/tsdb/series_index.go:127 +0x91
2018-05-28T15:16:20.592504000Z github.com/influxdata/influxdb/tsdb.(*SeriesSegment).ForEachEntry(0xc432ea9040, 0xc4203bc860, 0x0, 0x0)
2018-05-28T15:16:20.592713000Z /go/src/github.com/influxdata/influxdb/tsdb/series_segment.go:242 +0xea
2018-05-28T15:16:20.592911000Z github.com/influxdata/influxdb/tsdb.(*SeriesIndex).Recover(0xc4201ef680, 0xc4504ee6c0, 0x5, 0x8, 0xc44ca48f70, 0x0)
2018-05-28T15:16:20.593114000Z /go/src/github.com/influxdata/influxdb/tsdb/series_index.go:123 +0x169
2018-05-28T15:16:20.593316000Z github.com/influxdata/influxdb/tsdb.(*SeriesPartition).Open.func1(0xc432dd8370, 0x2f, 0x1ff)
2018-05-28T15:16:20.593522000Z /go/src/github.com/influxdata/influxdb/tsdb/series_partition.go:87 +0x132
2018-05-28T15:16:20.593736000Z github.com/influxdata/influxdb/tsdb.(*SeriesPartition).Open(0xc432dd8370, 0xc4504ee680, 0x1)
2018-05-28T15:16:20.593935000Z /go/src/github.com/influxdata/influxdb/tsdb/series_partition.go:92 +0x11c
2018-05-28T15:16:20.594142000Z github.com/influxdata/influxdb/tsdb.(*SeriesFile).Open(0xc432ea8cd0, 0x0, 0x0)
2018-05-28T15:16:20.594361000Z /go/src/github.com/influxdata/influxdb/tsdb/series_file.go:67 +0x370
2018-05-28T15:16:20.594561000Z github.com/influxdata/influxdb/tsdb.(*Store).openSeriesFile(0xc42040c000, 0xc4203c06a7, 0xd, 0xc454228120, 0x24, 0xc4201e0600)
2018-05-28T15:16:20.594787000Z /go/src/github.com/influxdata/influxdb/tsdb/store.go:382 +0x123
2018-05-28T15:16:20.594987000Z github.com/influxdata/influxdb/tsdb.(*Store).loadShards(0xc42040c000, 0x0, 0x0)
2018-05-28T15:16:20.595186000Z /go/src/github.com/influxdata/influxdb/tsdb/store.go:228 +0x619
2018-05-28T15:16:20.595391000Z github.com/influxdata/influxdb/tsdb.(*Store).Open(0xc42040c000, 0x0, 0x0)
2018-05-28T15:16:20.595608000Z /go/src/github.com/influxdata/influxdb/tsdb/store.go:160 +0x28b
2018-05-28T15:16:20.595818000Z github.com/influxdata/influxdb/cmd/influxd/run.(*Server).Open(0xc42039a5a0, 0xc4203864c0, 0xc42039a5a0)
2018-05-28T15:16:20.596028000Z /go/src/github.com/influxdata/influxdb/cmd/influxd/run/server.go:418 +0xa56
2018-05-28T15:16:20.596220000Z github.com/influxdata/influxdb/cmd/influxd/run.(*Command).Run(0xc4202048f0, 0xc4200101a0, 0x0, 0x0, 0xc4200b82a0, 0xc4200b82a0)
2018-05-28T15:16:20.596423000Z /go/src/github.com/influxdata/influxdb/cmd/influxd/run/command.go:140 +0xd9a
2018-05-28T15:16:20.596642000Z main.(*Main).Run(0xc420095f40, 0xc4200101a0, 0x0, 0x0, 0x5b0c1d3f, 0x17da5146)
2018-05-28T15:16:20.596850000Z /go/src/github.com/influxdata/influxdb/cmd/influxd/main.go:81 +0x1d5
2018-05-28T15:16:20.597065000Z main.main()
2018-05-28T15:16:20.597264000Z /go/src/github.com/influxdata/influxdb/cmd/influxd/main.go:45 +0x16a
I have 5 different influx instances and each is failing with above exception.
My initial understanding was database telegraf_1080d might have been corrupted. Hence i moved it out so that influx avoids loading telegraf_1080d. However it then started to fail while load TSM files of
"k8s" . database.
2018-05-28T16:39:04.709773000Z ts=2018-05-28T16:39:04.709328Z lvl=info msg="Opened file" log_id=08Lui_7G000 engine=tsm1 service=filestore path=/var/lib/influxdb/data/k8s/default/396/000000394-000000003.tsm id=0 duration=647.044ms
2018-05-28T16:39:04.887889000Z ts=2018-05-28T16:39:04.887379Z lvl=info msg="Opened file" log_id=08Lui_7G000 engine=tsm1 service=filestore path=/var/lib/influxdb/data/k8s/default/423/000000359-000000002.tsm id=0 duration=824.466ms
2018-05-28T16:39:05.131921000Z ts=2018-05-28T16:39:05.131287Z lvl=info msg="Open store (end)" log_id=08Lui_7G000 service=store trace_id=08Lui_Wl000 op_name=tsdb_open op_event=end op_elapsed=5159.434ms
2018-05-28T16:39:05.134954000Z panic: unreachable
2018-05-28T16:39:05.135410000Z goroutine 1 [running]:
2018-05-28T16:39:05.135614000Z github.com/influxdata/influxdb/tsdb.(*SeriesIndex).execEntry(0xc420362280, 0x290225f, 0x61695f6e616d6573, 0x4016fe085, 0x0, 0x0, 0x0)
Influx Environment variables
"env": [
{
"name": "INFLUXDB_DATA_MAX_SERIES_PER_DATABASE",
"value": "0"
},
{
"name": "INFLUXDB_ADMIN_ENABLED",
"value": "true"
},
{
"name": "INFLUXDB_ADMIN_BIND_ADDRESS",
"value": ":80"
},
{
"name": "INFLUXDB_DATA_CACHE_MAX_MEMORY_SIZE",
"value": "1048576000"
},
{
"name": "INFLUXDB_DATA_MAX_VALUES_PER_TAG",
"value": "0"
}
],
Influx data architecture.
I have three databases.
- Telegraf writes 10 second data to telegraf_30d with 30 days RP.
- CQ aggregates data on telegraf_30d with 30 seconds granularity into telegraf_180d database with 180d RP
- CQ aggregates data on telegraf_1800d with 60 seconds granularity into telegraf_1080d database with 1080d RP
System has been running fine for 5 months.
Even restarting (so that kubernetes schedules on a different node ) influx did not help.
# du -sh data meta wal
44G data
20K meta
55M wal
/opt/influxdb#
# # du -sh *
120M _internal
24G k8s
7.0G telegraf_1080d
7.0G telegraf_180d
5.8G telegraf_30d
Please note It will take at least 30 seconds for the first cURL command above to return a response.
This is because it will run a CPU profile as part of its information gathering, which takes 30 seconds to collect.
Ideally you should run these commands when you're experiencing problems, so we can capture the state of the system at that time.
If you're concerned about running a CPU profile (which only has a small, temporary impact on performance), then you can set ?cpu=false
or omit ?cpu=true
altogether.
Please run those if possible and link them from a gist or simply attach them as a comment to the issue.
Please note, the quickest way to fix a bug is to open a Pull Request.
Activity