Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MQTT input plugin disconnecting from Mosquitto #4594

Closed
lazarberbakov opened this issue Aug 26, 2018 · 4 comments · Fixed by #4846
Closed

MQTT input plugin disconnecting from Mosquitto #4594

lazarberbakov opened this issue Aug 26, 2018 · 4 comments · Fixed by #4846
Labels
area/mqtt bug unexpected problem or unintended behavior
Milestone

Comments

@lazarberbakov
Copy link

We are planning to use TICK stack on a EU funded project for collection of measurements from MQTT broker. For this purpose, we setup InfluxDB, MQTT broker (Mosquitto) and Telegraf (mqtt_consumer input plugin and InfluxDB output plugin). Telegraf connects successfully to Mosquitto, and starts collecting published metrics in JSON format. However, from time to time it disconnects from the MQTT broker. After a couple of days, it stops reconnecting after losing connection and all the metrics published to MQTT broker are lost.
In the sequel, I provide you with both Telegraf and Mosquitto logs at the moment when there is a reconnection.

Telegraf log:
2018-08-26T18:07:20Z D! Output [influxdb] buffer fullness: 73 / 10000 metrics.
2018-08-26T18:07:20Z D! Output [influxdb] wrote batch of 73 metrics in 7.241577ms
2018-08-26T18:07:22Z E! Error in plugin [inputs.mqtt_consumer]: E! MQTT Connection lost
error: pingresp not received, disconnecting
MQTT Client will try to reconnect
2018-08-26T18:07:22Z I! MQTT Client Connected
2018-08-26T18:07:40Z D! Output [influxdb] buffer fullness: 76 / 10000 metrics.
2018-08-26T18:07:40Z D! Output [influxdb] wrote batch of 76 metrics in 5.022393ms
**--------------------------------------------------------------------------------------------
Mosquitto log
1535306832: Received PINGREQ from energo2
1535306832: Sending PINGRESP to energo2
.
.
.
1535306842: Sending PUBLISH to energo2 (d0, q0, r0, m0, 'ENE-E893B/data', ... (240 bytes))
1535306842: Socket error on client energo2, disconnecting.
1535306842: New connection from ::1 on port 1883.
1535306842: Client energo2 disconnected.
1535306842: New client connected from ::1 as energo2 (c0, k60, u'user1').
1535306842: Sending CONNACK to energo2 (1, 0)
**---------------------------------------------------------------------------------------------

Relevant telegraf.conf:

[[inputs.mqtt_consumer]]
servers = ["tcp://localhost:1883"]
persistent_session = true
qos = 2
connection_timeout = "120s"
topics = [
"+/data",
]
name_override = "demand"
client_id = "xxxxx"
username = "xxxx"
password = "xxxxxxxxx"
data_format = "json"
tag_keys = [
"measurementID",
"measurementIndex",
"deviceID"
]

System info:

Telegraf version 1.7.3, Ubuntu 16.04, Mosquitto ver. 1.4.14

@danielnelson danielnelson added the bug unexpected problem or unintended behavior label Aug 27, 2018
@danielnelson
Copy link
Contributor

I believe this is an issue in the eclipse/paho.mqtt.golang library version we are using. In the next release of Telegraf (1.8) we are using an updated version of this library, could you try using one of the
nightly development builds and see if the error persists?

If the error still occurs, can you add the Telegraf logs around the moment that Telegraf loses connection and does not attempt or is unable to reconnect?

@lazarberbakov
Copy link
Author

lazarberbakov commented Aug 29, 2018

I downloaded nightly build and noticed the following behavior.

As can be seen in Telegraf log below, it outputs twice that the MQTT Client is connected, even though I have only one MQTT consumer configuration. After some time it stops collecting metrics from MQTT broker, exactly at the point where MQTT broker Mosquitto says that there was a socket error.
I would also like to point out that I compiled newer version of Mosquitto 1.5.15 with websocket support turned on.
--------- Telegraf log -----------------
2018-08-29T12:55:17Z I! Starting Telegraf
2018-08-29T12:55:17Z I! Loaded inputs: inputs.mqtt_consumer
2018-08-29T12:55:17Z I! Loaded aggregators:
2018-08-29T12:55:17Z I! Loaded processors:
2018-08-29T12:55:17Z I! Loaded outputs: influxdb
2018-08-29T12:55:17Z I! Tags enabled: host=xxxxx.rs
2018-08-29T12:55:17Z I! Agent Config: Interval:20s, Quiet:false, Hostname:"xxxxx.rs", Flush Interval:20s
2018-08-29T12:55:20Z I! MQTT Client Connected
2018-08-29T12:55:20Z I! MQTT Client Connected

2018-08-29T12:55:40Z D! Output [influxdb] buffer fullness: 84 / 10000 metrics.
2018-08-29T12:55:40Z D! Output [influxdb] wrote batch of 84 metrics in 6.111048ms
2018-08-29T12:56:00Z D! Output [influxdb] buffer fullness: 77 / 10000 metrics.
2018-08-29T12:56:00Z D! Output [influxdb] wrote batch of 77 metrics in 6.721614ms
2018-08-29T12:56:20Z D! Output [influxdb] buffer fullness: 78 / 10000 metrics.
2018-08-29T12:56:20Z D! Output [influxdb] wrote batch of 78 metrics in 6.970926ms
2018-08-29T12:56:40Z D! Output [influxdb] buffer fullness: 75 / 10000 metrics.
2018-08-29T12:56:40Z D! Output [influxdb] wrote batch of 75 metrics in 5.542047ms
2018-08-29T12:57:00Z D! Output [influxdb] buffer fullness: 38 / 10000 metrics.
2018-08-29T12:57:00Z D! Output [influxdb] wrote batch of 38 metrics in 4.349074ms
2018-08-29T12:57:20Z D! Output [influxdb] buffer fullness: 0 / 10000 metrics.
2018-08-29T12:57:40Z D! Output [influxdb] buffer fullness: 0 / 10000 metrics.
2018-08-29T12:58:00Z D! Output [influxdb] buffer fullness: 0 / 10000 metrics.

------------ Mosquitto log ------------------------
2018-08-29T12:56:49+00:00: Socket error on client respond1, disconnecting.

In case when Telegraf log states only once "MQTT Client Connected", it seems that it works properly.

In the sequel, I provide another time the same thing happened. I highlighted the time stamps where disconnection occurs.
It seems that Telegraf connects twice to MQTT broker:

---------- Telegraf log --------------------------------
2018-08-29T14:07:40Z D! Attempting connection to output: influxdb
2018-08-29T14:07:40Z D! Successfully connected to output: influxdb
2018-08-29T14:07:40Z I! Starting Telegraf
2018-08-29T14:07:40Z I! Loaded inputs: inputs.mqtt_consumer
2018-08-29T14:07:40Z I! Loaded aggregators:
2018-08-29T14:07:40Z I! Loaded processors:
2018-08-29T14:07:40Z I! Loaded outputs: influxdb
2018-08-29T14:07:40Z I! Tags enabled: host=lazar-VirtualBox
2018-08-29T14:07:40Z I! Agent Config: Interval:20s, Quiet:false, Hostname:"lazar-VirtualBox", Flush Interval:20s
2018-08-29T14:08:00Z I! MQTT Client Connected
2018-08-29T14:08:00Z I! MQTT Client Connected
2018-08-29T14:08:20Z D! Output [influxdb] buffer fullness: 74 / 10000 metrics.
2018-08-29T14:08:20Z D! Output [influxdb] wrote batch of 74 metrics in 25.087269ms
.........
2018-08-29T14:11:00Z D! Output [influxdb] buffer fullness: 65 / 10000 metrics.
2018-08-29T14:11:00Z D! Output [influxdb] wrote batch of 65 metrics in 19.585204ms
2018-08-29T14:11:20Z D! Output [influxdb] buffer fullness: 4 / 10000 metrics.
2018-08-29T14:11:20Z D! Output [influxdb] wrote batch of 4 metrics in 14.630283ms
2018-08-29T14:11:40Z D! Output [influxdb] buffer fullness: 0 / 10000 metrics.
2018-08-29T14:12:00Z D! Output [influxdb] buffer fullness: 0 / 10000 metrics.
2018-08-29T14:12:20Z D! Output [influxdb] buffer fullness: 0 / 10000 metrics.
2018-08-29T14:12:40Z D! Output [influxdb] buffer fullness: 0 / 10000 metrics.

---------- Mosquitto log -----------------------------
1535551681: New connection from x.x.x.x on port 1883.
1535551681: New client connected from x.x.x.x as Telegraf-Consumer-qOk1G (c1, k60, u'user').
1535551681: Sending CONNACK to Telegraf-Consumer-qOk1G (0, 0)
1535551681: Received SUBSCRIBE from Telegraf-Consumer-qOk1G
1535551681: +/data (QoS 2)
1535551681: Telegraf-Consumer-qOk1G 2 +/data
1535551681: Sending SUBACK to Telegraf-Consumer-qOk1G
1535551681: New connection from x.x.x.x on port 1883.
1535551681: Client Telegraf-Consumer-qOk1G already connected, closing old connection.
1535551681: Client Telegraf-Consumer-qOk1G disconnected.
1535551681: New client connected from x.x.x.x as Telegraf-Consumer-qOk1G (c1, k60, u'user').
1535551681: Sending CONNACK to Telegraf-Consumer-qOk1G (0, 0)
1535551681: Received SUBSCRIBE from Telegraf-Consumer-qOk1G
1535551681: +/data (QoS 2)
1535551681: Telegraf-Consumer-qOk1G 2 +/data
1535551681: Sending SUBACK to Telegraf-Consumer-qOk1G
......
1535551801: Received PINGREQ from Telegraf-Consumer-qOk1G
1535551801: Sending PINGRESP to Telegraf-Consumer-qOk1G
..... 60 seconds (keepalive) passed between
2018-08-29T14:11:01
1535551861: Received PINGREQ from Telegraf-Consumer-qOk1G
1535551861: Sending PINGRESP to Telegraf-Consumer-qOk1G
1535551861: Socket error on client Telegraf-Consumer-qOk1G, disconnecting.

@GillesC
Copy link

GillesC commented Sep 12, 2018

I can confirm the aforementioned issue where the nightly build connects two times to the broker.

2018-09-12T09:12:20Z I! MQTT Client Connected
2018-09-12T09:12:20Z I! MQTT Client Connected
2018-09-12T09:12:30Z D! Output [influxdb] buffer fullness: 1 / 10000 metrics. 
2018-09-12T09:12:30Z D! Output [influxdb] wrote batch of 1 metrics in 30.380449ms

It also seems that the mqtt_consumer stopped gathering metrics after some hours as stated by the reference.

@danielnelson
Copy link
Contributor

Can you test if this change helps: #4846 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/mqtt bug unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants