Skip to content

telegraf just stops working #1230

Closed
@RainerW

Description

i'm testing telegraf on different systems and it seems to stop working after some time.
I guess tehere is a kind of network hickup, because it stops working on multipe servers at around the same time. But some Servers just continued to work. So I'm sure neither influx nor grafana had a problem.

The Systems in question are most Ubuntu 11,14 or 16. But one of the ubuntu 16 servers continue to work.
In all cases the logfiles just stopped containing anything, but the process continues to run. My guess is that there is no safeguard around the metrics, so when they start hanging for whatever reason telegraf stops working .?

Last Log entries are:

2016/05/13 11:15:30 Wrote 21 metrics to output influxdb in 46.968928ms
2016/05/13 11:15:40 Gathered metrics, (10s interval), from 11 inputs in 33.493802ms

Which is the point in time where telegraf stops reporting.
It seems to be still running:

> ps aux | grep tel
telegraf  7095  0.0  0.0 129524  3372 ?        Sl   May13   0:25 /usr/bin/telegraf -pidfile /var/run/telegraf/telegraf.pid -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf/telegraf.d

After an service restart all is working fine again. But this happend before, so i guess it will also happen again.

I know this ticket is very broad, but i have nothing to pin it down to. Only there should be safeguards in place to prevent telegraf from stopping completely.

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions