Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plugins/system/disk.go diskusage queries all devices first, prunes later #440

Closed
Millnert opened this issue Dec 12, 2015 · 2 comments · Fixed by #558
Closed

plugins/system/disk.go diskusage queries all devices first, prunes later #440

Millnert opened this issue Dec 12, 2015 · 2 comments · Fixed by #558
Labels
bug unexpected problem or unintended behavior

Comments

@Millnert
Copy link

On systems that have NFS mounts under heavy load, the telegraf process locks up when using the system/disk plugin. Data gathering takes too long.

I have traced the issue to the plugins/system/disk.go code performing some meta command to retrieve disk usage statistics for presumably ALL filesystems first, and then applying the pruning on the retrieved data.

That's precisely the wrong order of operations in order to provide for defensive application of telegraf on loaded systems. I'm no Golang expert, but a quick googling reveals many ways to query file system statistics for a specific mount point.

So I propose instead that the algorithm be updated a bit at https://github.com/influxdb/telegraf/blob/master/plugins/system/disk.go#L30:

  1. get all filesystems to a list
  2. for each filesystem in the list:
    2.1 if the filesystem path isn't in the mountpath list from config (if it exists)
    2.1.1 get information from this filesystem and append to a list of results with correct tags etc.
  3. for each item in the list of results:
    3.1 add the items to the accumulator object
  4. return the accumulator object
@Millnert
Copy link
Author

I now noticed https://github.com/influxdb/telegraf/blob/master/plugins/system/ps.go#L70 - so perhaps it's this function that should get an optional "mountpoints" list argument instead.

@sparrc
Copy link
Contributor

sparrc commented Dec 12, 2015

Sounds reasonable to me 👍

@sparrc sparrc added the bug unexpected problem or unintended behavior label Dec 12, 2015
sparrc added a commit that referenced this issue Jan 20, 2016
sparrc added a commit that referenced this issue Jan 20, 2016
sparrc added a commit that referenced this issue Jan 20, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants