Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

input did not complete within its interval #5796

Closed
dynek opened this issue May 2, 2019 · 5 comments · Fixed by #5813
Closed

input did not complete within its interval #5796

dynek opened this issue May 2, 2019 · 5 comments · Fixed by #5813
Labels
bug unexpected problem or unintended behavior

Comments

@dynek
Copy link
Contributor

dynek commented May 2, 2019

Relevant telegraf.conf:

[global_tags]
[agent]
  interval = "10s"
  round_interval = true
  metric_batch_size = 1000
  metric_buffer_limit = 10000
  collection_jitter = "0s"
  flush_interval = "10s"
  flush_jitter = "0s"
  precision = ""
  debug = false
  quiet = false
  logfile = ""
  hostname = ""
  omit_hostname = false
[[outputs.influxdb]]
  urls = ["http://influxdb.domain.net:8086"]
  database = "fibaro"
  retention_policy = ""
  write_consistency = "any"
  timeout = "5s"
[[inputs.fibaro]]
  url = "http://fibaro.domain.net:80"
  username = "admin"
  password = "<password>"

System info:

I tried both 1.8.0 and 1.10.3 running in containers on a Debian 9.8 system with docker-ce 18.09.53-0debian-stretch

Steps to reproduce:

  1. Run the container with the above config
  2. Wait

Expected behavior:

Metrics flowing smoothly from Fibaro to InfluxDB.

Actual behavior:

After a random amount of time, I start seeing the following message every ten seconds:

2019-05-02T17:52:00Z W! [agent] input "inputs.fibaro" did not complete within its interval

Additional info:

This configuration used to work as-is on another host so I'm not too sure what's wrong here.
What actually lead me to open a bug report is that I ran a tcpdump and netstat (loop) on both sides and when this happens, the Fibaro box (10.1.0.31) doesn't see a single TCP SYN or whatever coming from the Docker container, and netstat in the docker container running Telegraf shows:

tcp        0      0 172.17.0.5:35192        10.1.0.31:80            ESTABLISHED 1/telegraf

Once again 10.1.0.31 in this case (Fibaro box) doesn't see an established connection from the container. I don't see how the Fibaro box, the host running the container or my network could be guilty for that.

Running telegraf a second time (docker exec -ti sh) inside that container works, meaning metrics make it again to InfluxDB, restarting it completely also workarounds things.

Any hint to debug it further?

Thank you

@danielnelson
Copy link
Contributor

Looks like Telegraf thinks it is connected but it is not receiving a response, Fibaro probably hasn't received anything and it is somewhere in docker networking land which I often find to be mysterious.

I suggest adding a timeout to the fibaro plugin, there is no default timeout current (we should change this). Once you have the timeout I am guessing it will run once, fail, and then the second time it will work. Let's check if that is indeed the case.

[[inputs.fibaro]]
  url = "http://fibaro.domain.net:80"
  username = "admin"
  password = "<password>"
  timeout = "5s"

@danielnelson danielnelson added the bug unexpected problem or unintended behavior label May 2, 2019
@dynek
Copy link
Contributor Author

dynek commented May 2, 2019

Thank you for your quick answer @danielnelson, giving it a try straight away

@dynek
Copy link
Contributor Author

dynek commented May 3, 2019

I thought a default timeout was implemented, but didn't check the code to confirm :-)

So far it seems to work:

2019-05-03T04:45:15Z E! [inputs.fibaro]: Error in plugin: Get http://fibaro.domain.net:80/api/sections: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2019-05-03T05:43:25Z E! [inputs.fibaro]: Error in plugin: net/http: request canceled (Client.Timeout exceeded while reading body)
2019-05-03T06:31:45Z E! [inputs.fibaro]: Error in plugin: net/http: request canceled (Client.Timeout exceeded while reading body)

I'll wait couple more hours but so far so good, thanks @danielnelson. I will then close the bug report.

Weird though that it used to work on the other host, probably docker (network) related then.

Thank you!

@dynek
Copy link
Contributor Author

dynek commented May 5, 2019

So far so good ... "fixed" :-) Thank you @danielnelson

@danielnelson
Copy link
Contributor

I will set this as the default timeout in 1.11, it really should have one and the commented out timeout in the sample config indicates the timeout should be 5s.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants