Skip to content

Telegraf stops publishing metrics to InfluxDB; All plugins take too long to collect #3629

Closed
@jgitlin-bt

Description

Bug report

After a seemingly random amount of time, Telegraf stops publishing metrics to InfluxDB over UDP. I have been experiencing this issue since Nov 2016 on both Telegraf 1.3.x and 1.4.4 on FreeBSD, on three separate servers. In the telegraf log, all collectors start to fail with

Error in plugin [inputs.$name]: took longer to collect than collection interval (10s)

I can't see anything unusual or interesting published from the Telegraf internal metrics.

This same issue has been reported in #3318, #2183, #2919, #2780 and #2870 but all those issues are either abandoned by the requestor, or confused with several separate issues; I am pening a new issue for my specific problem but if it's a duplicate (#3318 seems to be the closest) then please feel free to close

Relevant telegraf.conf:

telegraf.conf

System info:

Telegraf v1.4.4 (git: unknown unknown) running on FreeBSD 10.3-RELEASE-p24

Steps to reproduce:

  1. service telegraf start
  2. wait $random time period

Expected behavior:

Telegraf publishes metrics to InfluxDB server over UDP

Actual behavior:

Telegraf stops publishing metrics seemingly randomly, all input plugins start to fail with:

2018-01-02T17:16:50Z E! Error in plugin [inputs.processes]: took longer to collect than collection interval (10s)
2018-01-02T17:16:51Z E! Error in plugin [inputs.apache]: took longer to collect than collection interval (10s)
2018-01-02T17:16:51Z E! Error in plugin [inputs.mem]: took longer to collect than collection interval (10s)
2018-01-02T17:16:51Z E! Error in plugin [inputs.internal]: took longer to collect than collection interval (10s)
2018-01-02T17:16:51Z E! Error in plugin [inputs.system]: took longer to collect than collection interval (10s)
2018-01-02T17:16:51Z E! Error in plugin [inputs.swap]: took longer to collect than collection interval (10s)
2018-01-02T17:16:52Z E! Error in plugin [inputs.cpu]: took longer to collect than collection interval (10s)
2018-01-02T17:16:58Z E! Error: statsd message queue full. We have dropped 1 messages so far. You may want to increase allowed_pending_messages in the config
2018-01-02T17:17:00Z E! Error in plugin [inputs.phpfpm]: took longer to collect than collection interval (10s)
2018-01-02T17:17:00Z E! Error in plugin [inputs.statsd]: took longer to collect than collection interval (10s)

Additional info:

Full logs and stack trace

Earlier occurrence

Grafana snapshot of Internal Telegraf metrics

Metadata

Assignees

No one assigned

    Labels

    bugunexpected problem or unintended behavior

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions