Description
Directions
This week, we had an issue wherein at least 6 customer servers' Telegraf process was consuming 5GB of RAM all the time. Looking at the metrics Telegraf had collected, it looks to have started creeping gradually from the 1st of the month until memory used was sitting at 99%. Restarting the Telegraf service has released this memory pressure, but I'm observing it at the moment, in anticipation of it happening again.
Basically, I want to know what information I can provide to help tackle this if it's going to reoccur. Can I enable some verbose debugging and catch it in the act again, or is this related to the metrics I'm collecting? Although, there should be no change in series, so I would have thought the usage would be steady?
Bug report
Relevant telegraf.conf:
Just collecting a config dump now. Will attach shortly...
System info:
Mostly Windows Server 2012 R2 with one possible 2008 R2.
Steps to reproduce:
Run telegraf for a few weeks.