Description
i'm testing telegraf on different systems and it seems to stop working after some time.
I guess tehere is a kind of network hickup, because it stops working on multipe servers at around the same time. But some Servers just continued to work. So I'm sure neither influx nor grafana had a problem.
The Systems in question are most Ubuntu 11,14 or 16. But one of the ubuntu 16 servers continue to work.
In all cases the logfiles just stopped containing anything, but the process continues to run. My guess is that there is no safeguard around the metrics, so when they start hanging for whatever reason telegraf stops working .?
Last Log entries are:
2016/05/13 11:15:30 Wrote 21 metrics to output influxdb in 46.968928ms
2016/05/13 11:15:40 Gathered metrics, (10s interval), from 11 inputs in 33.493802ms
Which is the point in time where telegraf stops reporting.
It seems to be still running:
> ps aux | grep tel
telegraf 7095 0.0 0.0 129524 3372 ? Sl May13 0:25 /usr/bin/telegraf -pidfile /var/run/telegraf/telegraf.pid -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf/telegraf.d
After an service restart all is working fine again. But this happend before, so i guess it will also happen again.
I know this ticket is very broad, but i have nothing to pin it down to. Only there should be safeguards in place to prevent telegraf from stopping completely.