Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data stops flowing into Influx #10

Closed
2 tasks
Terkwood opened this issue Oct 5, 2018 · 10 comments
Closed
2 tasks

Data stops flowing into Influx #10

Terkwood opened this issue Oct 5, 2018 · 10 comments
Assignees
Labels
bug Something isn't working

Comments

@Terkwood
Copy link
Owner

Terkwood commented Oct 5, 2018

We have to regularly restart telegraf in order to keep data flowing. Let's try the following:

  • Upgrade telegraf to 1.9.0 and re test. They've recently (Nov 18) made a change to mqtt consumer lifetime which will probably fix this
  • Otherwise, replace telegraf with a small hand-written service

From telegraf logs:

2018-10-05T02:28:56Z E! [outputs.influxdb]: when writing to [http://PI_HOST:8086]: Post http://PI_HOST:8086/write?db=INFLUX_DB: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2018-10-05T02:28:56Z E! Error writing to output [influxdb]: could not write any address
2018-10-05T02:29:05Z E! [outputs.influxdb]: when writing to [http://PI_HOST:8086]: Post http://PI_HOST:8086/write?db=INFLUX_DB: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
2018-10-05T02:29:05Z E! Error writing to output [influxdb]: could not write any address
2018-10-05T05:01:52Z E! Error in plugin [inputs.mqtt_consumer]: E! MQTT Connection lost
error: pingresp not received, disconnecting
MQTT Client will try to reconnect
2018-10-05T05:01:52Z I! MQTT Client Connected2018-10-05T02:28:56Z E! [outputs.influxdb]: when writing to [http://PI_HOST:8086]: Post http://PI_HOST:8086/write?db=INFLUX_DB: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2018-10-05T02:28:56Z E! Error writing to output [influxdb]: could not write any address
2018-10-05T02:29:05Z E! [outputs.influxdb]: when writing to [http://PI_HOST:8086]: Post http://PI_HOST:8086/write?db=INFLUX_DB: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
2018-10-05T02:29:05Z E! Error writing to output [influxdb]: could not write any address
2018-10-05T05:01:52Z E! Error in plugin [inputs.mqtt_consumer]: E! MQTT Connection lost
error: pingresp not received, disconnecting
MQTT Client will try to reconnect
2018-10-05T05:01:52Z I! MQTT Client Connected

From influx logs:

[httpd] 172.26.0.1 - - [05/Oct/2018:05:05:00 +0000] "POST /write?db=INFLUX_DB HTTP/1.1" 204 0 "-" "telegraf" 35fb89be-c85c-11e8-8a21-000000000000 51631
[httpd] 172.26.0.1 - - [05/Oct/2018:05:05:10 +0000] "POST /write?db=INFLUX_DB HTTP/1.1" 204 0 "-" "telegraf" 3bf16686-c85c-11e8-8a22-000000000000 52104
[httpd] 172.26.0.1 - - [05/Oct/2018:05:05:20 +0000] "POST /write?db=INFLUX_DB HTTP/1.1" 204 0 "-" "telegraf" 41e747f8-c85c-11e8-8a23-000000000000 64768
ts=2018-10-05T05:15:20.309787Z lvl=info msg="Cache snapshot (start)" log_id=0AxG~9eW000 engine=tsm1 trace_id=0AxfGoTG000 op_name=tsm1_cache_snapshot op_event=start
ts=2018-10-05T05:15:20.373113Z lvl=info msg="Snapshot for path written" log_id=0AxG~9eW000 engine=tsm1 trace_id=0AxfGoTG000 op_name=tsm1_cache_snapshot path=/PATH/autogen/5 duration=63.462ms
ts=2018-10-05T05:15:20.373335Z lvl=info msg="Cache snapshot (end)" log_id=0AxG~9eW000 engine=tsm1 trace_id=0AxfGoTG000 op_name=tsm1_cache_snapshot op_event=end op_elapsed=63.597ms
ts=2018-10-05T05:23:48.630413Z lvl=info msg="Retention policy deletion check (start)" log_id=0AxG~9eW000 service=retention trace_id=0Axfkq5W000 op_name=retention_delete_check op_event=start
@Terkwood
Copy link
Owner Author

Terkwood commented Oct 9, 2018

There's an issue with the telegraf setup. This is separate from the MQTT breakdown which was observed in #13.

Here's an example of telegraf eventually not working:

2018-10-07T12:26:12Z I! Starting Telegraf v1.6.4
2018-10-07T12:26:12Z I! Loaded outputs: influxdb
2018-10-07T12:26:12Z I! Loaded inputs: inputs.mqtt_consumer
2018-10-07T12:26:12Z I! Tags enabled: host=2e633ad06088
2018-10-07T12:26:12Z I! Agent Config: Interval:10s, Quiet:false, Hostname:"2e633ad06088", Flush Interval:10s
2018-10-07T12:26:12Z I! MQTT Client Connected
2018-10-07T12:26:30Z E! [outputs.influxdb]: when writing to [http://yoururl:yourport]: Post http://yoururl:yourport/write?db=yourdb: EOF
2018-10-07T12:26:30Z E! Error writing to output [influxdb]: could not write any address
2018-10-07T17:54:23Z E! Error in plugin [inputs.mqtt_consumer]: E! MQTT Connection lost
error: pingresp not received, disconnecting
MQTT Client will try to reconnect
2018-10-07T17:54:23Z I! MQTT Client Connected
2018-10-07T21:26:33Z E! Error in plugin [inputs.mqtt_consumer]: E! MQTT Connection lost
error: pingresp not received, disconnecting
MQTT Client will try to reconnect
2018-10-07T21:26:33Z I! MQTT Client Connected
2018-10-08T05:57:44Z E! Error in plugin [inputs.mqtt_consumer]: E! MQTT Connection lost
error: pingresp not received, disconnecting
MQTT Client will try to reconnect
2018-10-08T05:57:44Z I! MQTT Client Connected
2018-10-08T06:32:54Z E! Error in plugin [inputs.mqtt_consumer]: E! MQTT Connection lost
error: pingresp not received, disconnecting
MQTT Client will try to reconnect
2018-10-08T06:32:54Z I! MQTT Client Connected
2018-10-08T20:07:06Z E! Error in plugin [inputs.mqtt_consumer]: E! MQTT Connection lost
error: pingresp not received, disconnecting
MQTT Client will try to reconnect
2018-10-08T20:07:06Z I! MQTT Client Connected

@Terkwood
Copy link
Owner Author

Terkwood commented Oct 9, 2018

See influxdata/telegraf#4594

@Terkwood
Copy link
Owner Author

I've been working around this in an inelegant way:

watch -n 14400 "docker-compose restart telegraf"

@Terkwood
Copy link
Owner Author

Terkwood commented Nov 1, 2018

We haven't been restarting telegraf lately. Also the problem hasn't shown up lately. Would like to know why it was happening previously, but we may close this issue if we don't see any bad behavior over the next week, whether we have time to dig into the underlying cause for the previous failure mode or not. ONWARD!

@Terkwood Terkwood self-assigned this Nov 1, 2018
@Terkwood
Copy link
Owner Author

Terkwood commented Nov 7, 2018

Spotted this a couple times is the last few days. Is there a new version of Telegraf we can use? If not, maybe we can write a small program that performs this same function on our behalf.

@Terkwood
Copy link
Owner Author

Terkwood commented Nov 8, 2018

telegraf didn't receive pingresp

telegraf_1   | 2018-11-07T22:37:49Z E! Error in plugin [inputs.mqtt_consumer]: E! MQTT Connection lost
telegraf_1   | error: pingresp not received, disconnecting

@Terkwood
Copy link
Owner Author

Terkwood commented Dec 4, 2018

Recently we've observed a full week without a failure. And we haven't been restarting Telegraf regularly. 🤔

@Terkwood
Copy link
Owner Author

Terkwood commented Dec 4, 2018

Make sure we're up to date with

influxdata/telegraf#4846 (comment)

@Terkwood
Copy link
Owner Author

Terkwood commented Dec 4, 2018

And there is a new version of Telegraf available

https://github.com/influxdata/telegraf/releases/tag/1.9.0

@Terkwood
Copy link
Owner Author

Terkwood commented Jun 7, 2019

Abandoning use of influx, entirely, for now 😈

@Terkwood Terkwood closed this as completed Jun 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant