Container metric collection for collectd
This is a collectd plugin to collect Docker container stats, similar to what the stats API provides, but hopefully more lightweight as data is collected reading pseudo-files from the /proc
and /sys
filesystems. The Docker API is called, but only to get the list of running containers.
The perl modules WWW::Curl
and JSON
are the only dependencies. In Debian/Ubuntu, they come packaged as libwww-curl-perl
and libjson-perl
respectively.
Let collectd know about the custom data types used by the plugin by copying the docker_stats.db
files somewhere and referencing its location in collectd's configuration, eg (in collectd.conf
):
TypesDB "/usr/share/collectd/types.db" "/path/to/docker_stats.db"
...
# Load Perl plugin if you weren't loading it already
<LoadPlugin perl>
Globals true
</LoadPlugin>
...
<Plugin perl>
IncludeDir "/path/to/collectd_plugins"
BaseName "Collectd::Plugins"
...
LoadPlugin docker_stats
<Plugin docker_stats>
# SizeStats true # optional: whether to obtain SizeRootFs and SizeRw values from containers, default false
# CgroupBase /sys/fs/cgroup # optional: where to look for cgroup pseudo-files, default /sys/fs/cgroup
# DockerUnixSocket var/run/docker.sock # optional: where to find docker API Unix socket, default /var/run/docker.sock
</Plugin>
</Plugin>
Put the actual plugin (docker_stats.pm
) inside /path/to/collectd_plugins/Collectd/Plugins
(or whatever your IncludeDir
and BaseName
above are). Note however that the plugin package name assumes you're using Collectd::Plugins as BaseName
.
Finally, restart collectd and hopefully see the values being collected.
If you don't want all the data that is reported by the plugin, you can of course use collectd's filters to exclude part of the values, with different levels of granularity.
The following example (to be put in collectd's config file) discards all network values for container mycontainer
:
LoadPlugin match_regex
...
PreCacheChain "DiscardSomeNetStats"
<Chain "DiscardSomeNetStats">
<Rule "1">
<Match "regex">
Plugin "^docker_stats_net$"
PluginInstance "^mycontainer$"
</Match>
<Target "stop">
</Target>
</Rule>
<Target "write">
</Target>
</Chain>
The plugin never invokes the /stats
API, but rather collects its data from some cgroup and /proc
pseudo-files. However, it does call the API at each iteration to get the list of running containers, which of course could change over time.
Four types of data are collected:
-
CPU stats. The percentage is calculated using the same formula that
docker stats
uses, however the calculation is performed at each collectd iteration (eg, 10 seconds) rather than each second, so the results might be less accurate than those obtained withdocker stats
. -
Memory stats. Four numbers are reported:
usage
,max_usage
,limit
are taken straight from the cgroup filesmemory.usage_in_bytes
,memory.max_usage_in_bytes
andmemory.limit_in_bytes
respectively. The fourth value,usage_stats
, is calculated similarly to howdocker stats
seems to compute the used memory, that is, by taking the usage and subtracting the cache. -
BLKIO stats. There are two types of statistics, those taken from the
blkio.throttle.*
set of files and those taken from the plainblkio.*
files (part of theCFQ
disk scheduler). Both should theoretically show the same data, however it seems like the CFQ stats are only updated when the underlying device is using the CFQ scheduler, which may not always be the case. The throttle stats seem to be more consistent. -
Network stats. These are taken straight from the
/proc/<pid>/net/dev
pseudo-file for each container's main process.
Only works on Linux (obviously) and might break if the location or contents of some pseudo-file ever changes (which is almost guaranteed to happen when cgroups v2 arrive, and possibly even within v1 at any time).
It's racy in certain parts, especially network stats, altough the corner cases are very unlikely to happen and it should work fine most of the time. The idea is to avoid doing full API queries to inspect each ocntainer at each iteration.