Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regression: Telegraf 1.19.3 fails to start with couchbase 6.5.1 #9764

Closed
danielmotaleite opened this issue Sep 15, 2021 · 2 comments · Fixed by #11045
Closed

regression: Telegraf 1.19.3 fails to start with couchbase 6.5.1 #9764

danielmotaleite opened this issue Sep 15, 2021 · 2 comments · Fixed by #11045
Labels
area/couchbase bug unexpected problem or unintended behavior

Comments

@danielmotaleite
Copy link

danielmotaleite commented Sep 15, 2021

Relevant telegraf.conf:

# Global tags can be specified here in key="value" format.
[global_tags]
  zone = "eu-central-1a"
  id = "couchbase-staging-1"
  environment = "staging"
  couchbase = "true"

[agent]
  interval = "10s"
  round_interval = true
  metric_batch_size = 1000
  metric_buffer_limit = 15000
  collection_jitter = "0s"
  flush_interval = "10s"
  flush_jitter = "0s"
  precision = ""
  debug = true
  logfile = "/var/log//telegraf/telegraf.log"
  quiet = false
  hostname = "couchbase-staging-1"
  omit_hostname = false

[[outputs.prometheus_client]]
  listen = ":9009"

[[inputs.couchbase]]
  servers = ["http://abc:xxx@127.0.0.1:8091/"]

System info:

Debian GNU/Linux 9.13 (stretch)
telegraf 1.19.3-1

Steps to reproduce:

  • Deploy a couchbase 6.5.1 and create a monitoring user
  • Deploy telegraf 1.19.3 with the above config

Expected behavior:

telegraf getting metrics from couchbase

Actual behavior:

telegraf enters a restart loop. Removing the couchbase config from telegraf, everything works.
No useful logs show up. Strace to the process make it not crash.

gdb show this:

Starting program: /usr/bin/telegraf -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf/telegraf.d
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7fffd0d0b700 (LWP 19560)]
[New Thread 0x7fffd050a700 (LWP 19561)]
[New Thread 0x7fffcfd09700 (LWP 19562)]
[New Thread 0x7fffcf508700 (LWP 19563)]
[New Thread 0x7fffced07700 (LWP 19564)]
[New Thread 0x7fffce506700 (LWP 19565)]
[New Thread 0x7fffcdb25700 (LWP 19566)]
[New Thread 0x7fffccfff700 (LWP 19567)]
2021-09-15T19:28:18Z I! Starting Telegraf 1.19.3
panic: runtime error: index out of range [-1]

goroutine 25 [running]:
github.com/influxdata/telegraf/plugins/inputs/couchbase.(*Couchbase).addBucketFieldChecked(0xc0001ea080, 0xc000ca1c50, 0x4f29c9e, 0x9, 0x7d676b8, 0x0, 0x0, 0xffffffffffffffff)
        /go/src/github.com/influxdata/telegraf/plugins/inputs/couchbase/couchbase.go:362 +0x116
github.com/influxdata/telegraf/plugins/inputs/couchbase.(*Couchbase).gatherDetailedBucketStats(0xc0001ea080, 0xc0000dc681, 0x32, 0xc0001fa5d0, 0x15, 0xc000ca1c50, 0x16, 0x0)
        /go/src/github.com/influxdata/telegraf/plugins/inputs/couchbase/couchbase.go:285 +0x3cba
github.com/influxdata/telegraf/plugins/inputs/couchbase.(*Couchbase).gatherServer(0xc0001ea080, 0x596ec78, 0xc000634460, 0xc0000dc681, 0x32, 0xc000cdbcf8, 0x589c501, 0xc000b98140)
        /go/src/github.com/influxdata/telegraf/plugins/inputs/couchbase/couchbase.go:111 +0xa18
github.com/influxdata/telegraf/plugins/inputs/couchbase.(*Couchbase).Gather.func1(0xc000afe030, 0x596ec78, 0xc000634460, 0xc0001ea080, 0xc0000dc681, 0x32)
        /go/src/github.com/influxdata/telegraf/plugins/inputs/couchbase/couchbase.go:64 +0x91
created by github.com/influxdata/telegraf/plugins/inputs/couchbase.(*Couchbase).Gather
        /go/src/github.com/influxdata/telegraf/plugins/inputs/couchbase/couchbase.go:62 +0x10d
[Thread 0x7fffce506700 (LWP 19565) exited]
[Thread 0x7fffced07700 (LWP 19564) exited]
[Thread 0x7fffcf508700 (LWP 19563) exited]
[Thread 0x7fffcfd09700 (LWP 19562) exited]
[Thread 0x7fffd050a700 (LWP 19561) exited]
[Thread 0x7fffd0d0b700 (LWP 19560) exited]
[Thread 0x7ffff7fed700 (LWP 19557) exited]
[Thread 0x7fffcdb25700 (LWP 19566) exited]
[Inferior 1 (process 19557) exited with code 02]
(gdb) 

Additional info:

Maybe related to similar #9416 and #9495 , as version 1.19.x had already several problems with couchbase
Reverting to telegraf 1.18.2, everything works

journalctl -u telegraf -f

set 15 19:13:10 couchbase-staging-1 telegraf[18332]: 2021-09-15T19:13:10Z I! Starting Telegraf 1.19.3



set 15 19:13:20 couchbase-staging-1 systemd[1]: telegraf.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
set 15 19:13:20 couchbase-staging-1 systemd[1]: telegraf.service: Unit entered failed state.
set 15 19:13:20 couchbase-staging-1 systemd[1]: telegraf.service: Failed with result 'exit-code'.
set 15 19:13:20 couchbase-staging-1 systemd[1]: telegraf.service: Service hold-off time over, scheduling restart.
set 15 19:13:20 couchbase-staging-1 systemd[1]: Stopped The plugin-driven server agent for reporting metrics into InfluxDB.
set 15 19:13:20 couchbase-staging-1 systemd[1]: Started The plugin-driven server agent for reporting metrics into InfluxDB.
set 15 19:13:20 couchbase-staging-1 telegraf[18344]: time="2021-09-15T19:13:20Z" level=error msg="failed to create cache directory. /etc/telegraf/.cache/snowflake, err: mkdir /etc/telegraf/.cache: permission denied. ignored\n" func="gosnowflake.(*defaultLogger).Errorf" file="log.go:120"
set 15 19:13:20 couchbase-staging-1 telegraf[18344]: time="2021-09-15T19:13:20Z" level=error msg="failed to open. Ignored. open /etc/telegraf/.cache/snowflake/ocsp_response_cache.json: no such file or directory\n" func="gosnowflake.(*defaultLogger).Errorf" file="log.go:120"
set 15 19:13:20 couchbase-staging-1 telegraf[18344]: 2021-09-15T19:13:20Z I! Starting Telegraf 1.19.3




set 15 19:13:30 couchbase-staging-1 systemd[1]: telegraf.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
set 15 19:13:30 couchbase-staging-1 systemd[1]: telegraf.service: Unit entered failed state.
set 15 19:13:30 couchbase-staging-1 systemd[1]: telegraf.service: Failed with result 'exit-code'.
set 15 19:13:30 couchbase-staging-1 systemd[1]: telegraf.service: Service hold-off time over, scheduling restart.
set 15 19:13:30 couchbase-staging-1 systemd[1]: Stopped The plugin-driven server agent for reporting metrics into InfluxDB.
set 15 19:13:30 couchbase-staging-1 systemd[1]: Started The plugin-driven server agent for reporting metrics into InfluxDB.
set 15 19:13:30 couchbase-staging-1 telegraf[18354]: time="2021-09-15T19:13:30Z" level=error msg="failed to create cache directory. /etc/telegraf/.cache/snowflake, err: mkdir /etc/telegraf/.cache: permission denied. ignored\n" func="gosnowflake.(*defaultLogger).Errorf" file="log.go:120"
set 15 19:13:30 couchbase-staging-1 telegraf[18354]: time="2021-09-15T19:13:30Z" level=error msg="failed to open. Ignored. open /etc/telegraf/.cache/snowflake/ocsp_response_cache.json: no such file or directory\n" func="gosnowflake.(*defaultLogger).Errorf" file="log.go:120"
set 15 19:13:30 couchbase-staging-1 telegraf[18354]: 2021-09-15T19:13:30Z I! Starting Telegraf 1.19.3

telegraf.log

2021-09-15T19:22:20Z D! [agent] Successfully connected to outputs.prometheus_client
2021-09-15T19:22:20Z D! [agent] Starting service inputs




2021-09-15T19:22:30Z I! Loaded inputs: couchbase
2021-09-15T19:22:30Z I! Loaded aggregators: 
2021-09-15T19:22:30Z I! Loaded processors: 
2021-09-15T19:22:30Z I! Loaded outputs: prometheus_client
2021-09-15T19:22:30Z I! Tags enabled: couchbase=true environment=staging host=couchbase-staging-1 id=couchbase-staging-1 zone=eu-central-1a
2021-09-15T19:22:30Z I! [agent] Config: Interval:10s, Quiet:false, Hostname:"couchbase-staging-1", Flush Interval:10s
2021-09-15T19:22:30Z D! [agent] Initializing plugins
2021-09-15T19:22:30Z D! [agent] Connecting outputs
2021-09-15T19:22:30Z D! [agent] Attempting connection to [outputs.prometheus_client]
2021-09-15T19:22:30Z I! [outputs.prometheus_client] Listening on http://[::]:9009/metrics
2021-09-15T19:22:30Z D! [agent] Successfully connected to outputs.prometheus_client
2021-09-15T19:22:30Z D! [agent] Starting service inputs


@danielmotaleite danielmotaleite added the bug unexpected problem or unintended behavior label Sep 15, 2021
@reimda
Copy link
Contributor

reimda commented Sep 17, 2021

Could you retest with 1.20.0-rc0? https://github.com/influxdata/telegraf/releases/tag/v1.20.0-rc0

@danielmotaleite
Copy link
Author

still the same issue:

2021-09-21T14:50:06Z I! Starting Telegraf 1.20.0
[New Thread 0x7fffccfff700 (LWP 9001)]
panic: runtime error: index out of range [-1]

goroutine 42 [running]:
github.com/influxdata/telegraf/plugins/inputs/couchbase.(*Couchbase).addBucketFieldChecked(0x414759, 0xc000a0a881, {0x4946045, 0xc0005cf278}, {0x7d76ed8, 0x494175a, 0x5440abda2001d8e7}, 0xc00087bb90)
        /go/src/github.com/influxdata/telegraf/plugins/inputs/couchbase/couchbase.go:362 +0xea
github.com/influxdata/telegraf/plugins/inputs/couchbase.(*Couchbase).gatherDetailedBucketStats(0x40b0e00, {0xc000a0a881, 0x32}, {0xc0005cf278, 0x15}, 0x0)
        /go/src/github.com/influxdata/telegraf/plugins/inputs/couchbase/couchbase.go:285 +0x23df
github.com/influxdata/telegraf/plugins/inputs/couchbase.(*Couchbase).gatherServer(0xc0006f7180, {0x540b7b8, 0xc0001ed420}, {0xc000a0a881, 0x32}, 0xc0008fd4f0)
        /go/src/github.com/influxdata/telegraf/plugins/inputs/couchbase/couchbase.go:111 +0xc4e
github.com/influxdata/telegraf/plugins/inputs/couchbase.(*Couchbase).Gather.func1({0xc000a0a881, 0x0})
        /go/src/github.com/influxdata/telegraf/plugins/inputs/couchbase/couchbase.go:64 +0x85
created by github.com/influxdata/telegraf/plugins/inputs/couchbase.(*Couchbase).Gather
        /go/src/github.com/influxdata/telegraf/plugins/inputs/couchbase/couchbase.go:62 +0x1c8
[Thread 0x7fffccfff700 (LWP 9001) exited]
[Thread 0x7fffcda25700 (LWP 8978) exited]
[Thread 0x7fffce366700 (LWP 8971) exited]
[Thread 0x7fffceb67700 (LWP 8970) exited]
[Thread 0x7fffcf368700 (LWP 8969) exited]
[Thread 0x7fffd03aa700 (LWP 8967) exited]
[Thread 0x7fffd0d0b700 (LWP 8966) exited]
[Thread 0x7ffff7fed700 (LWP 8959) exited]
[Inferior 1 (process 8959) exited with code 02]
(gdb)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/couchbase bug unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants