Skip to content

🐳 Monitoring of Docker containers (LXC/systemd Docker supported) - Zabbix template and Zabbix C module

License

Notifications You must be signed in to change notification settings

0312birdzhang/Zabbix-Docker-Monitoring

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Zabbix Docker Monitoring

If you like or use this project, please provide feedback to author - Star it ★.

Monitoring of Docker container by using Zabbix. Available CPU, mem, blkio container metrics and some containers config details e.g. IP, name, ... Zabbix Docker module has native support for Docker containers (Systemd included) and it should support also a few other container type (e.g. LXC) out of the box. Please feel free to test and provide feedback/open issue. Module is focused on the performance, see section Module vs. UserParameter script.

Please donate to author, so he can continue to publish other awesome projects for free:

[Paypal donate button] (https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=8LB6J222WRUZ4)

Build

[Download latest build (RHEL 7, CentOS 7, Ubuntu 14, ...)] (https://drone.io/github.com/monitoringartist/Zabbix-Docker-Monitoring/files/zabbix24/src/modules/zabbix_module_docker/zabbix_module_docker.so) [Build Status] (https://drone.io/github.com/monitoringartist/Zabbix-Docker-Monitoring/latest)
If provided build doesn't work on your system, please see section Compilation. Or you can check [folder dockerfiles] (https://github.com/monitoringartist/Zabbix-Docker-Monitoring/tree/master/dockerfiles), where Dockerfiles for various OS/Zabbix versions are prepared.

Available metrics

Note: fci - full container ID

Key Description
docker.discovery LLD discovering:
Only running containers are discovered.
Additional Docker permissions are needed, when you want to see container name (human name) in metrics/graphs instead of short container ID.
docker.mem[fci,mmetric] Memory metrics:
mmetric - any available memory metric in the pseudo-file memory.stat, e.g.: cache, rss, mapped_file, pgpgin, pgpgout, swap, pgfault, pgmajfault, inactive_anon, active_anon, inactive_file, active_file, unevictable, hierarchical_memory_limit, hierarchical_memsw_limit, total_cache, total_rss, total_mapped_file, total_pgpgin, total_pgpgout, total_swap, total_pgfault, total_pgmajfault, total_inactive_anon, total_active_anon, total_inactive_file, total_active_file, total_unevictable
docker.cpu[fci,cmetric] CPU metrics:
cmetric - any available CPU metric in the pseudo-file cpuacct.stat, e.g.: system, user
Jiffy CPU counter is recalculated to % value by Zabbix.
docker.dev[fci,bfile,bmetric] Blk IO metrics:
bfile - container blkio pseudo-file, e.g.: blkio.io_merged, blkio.io_queued, blkio.io_service_bytes, blkio.io_serviced, blkio.io_service_time, blkio.io_wait_time, blkio.sectors, blkio.time, blkio.avg_queue_size, blkio.idle_time, blkio.dequeue, ...
bmetric - any available blkio metric in selected pseudo-file, e.g.: Total. Option for selected block device only is also available e.g. '8:0 Sync' (quotes must be used in key parameter in this case)
Note: Some pseudo blkio files are available only if kernel config CONFIG_DEBUG_BLK_CGROUP=y, see recommended docs.
docker.inspect[fci,par1,<par2>] Docker inspection:
Requested value from Docker inspect JSON object is returned.
par1 - name of 1st level JSON property
par2 - optional name of 2nd level JSON property
For example:
docker.inspect[fci,NetworkSettings,IPAddress], docker.inspect[fci,State,StartedAt], docker.inspect[fci,Name]
Note 1: Requested value must be plain text/numeric value. JSON objects/arrays are not supported.
Note 2: Additional Docker permissions are needed.
docker.info[info] Docker information:
Requested value from Docker info JSON object is returned.
info - name of requested information, e.g. Containers, Images, NCPU, ...
Note: Additional Docker permissions are needed.
docker.stats[fci,par1,<par2>,<par3>] Docker container resource usage statistics:
Docker version 1.5+ is required
Requested value from Docker stats JSON object is returned.
par1 - name of 1st level JSON property
par2 - optional name of 2nd level JSON property
par3 - optional name of 3rd level JSON property
For example:
docker.stats[fci,memory_stats,usage], docker.stats[fci,network,rx_bytes], docker.stats[fci,cpu_stats,cpu_usage,total_usage]
Note 1: Requested value must be plain text/numeric value. JSON objects/arrays are not supported.
Note 2: Additional Docker permissions are needed.
Note 3: The most accurate way to get Docker container stats, but it's also the slowest (0.3-0.7s), because data are readed from on demand container stats stream.
docker.cstatus[status] Count of Docker containers in defined status:
status - status of container, available statuses:
All - count of all containers
Up - count of running containers (Paused included)
Exited - count of exited containers
Crashed - count of crashed containers (exit code != 0)
Paused - count of paused containers
Note: Additional Docker permissions are needed.
docker.up[fci] Running state check:
1 if container is running, otherwise 0
docker.xnet[fci,interface,nmetric] Network metrics (experimental):
interface - name of interface, e.g. eth0, if name is all, then sum of selected metric across all interfaces is returned (lo included)
nmetric - any available network metric name from output of command netstat -i:
MTU, Met, RX-OK, RX-ERR, RX-DRP, RX-OVR, TX-OK, TX-ERR, TX-DRP, TX-OVR
For example:
docker.xnet[fci,eth0,TX-OK]
docker.xnet[fci,all,RX-ERR]

Note: Root permissions (AllowRoot=1) are required, because net namespaces (/var/run/netns/) are created/used

Maybe in the future:

  • docker.log - log monitoring
  • systemd.service.discovery - discovering systemd services
  • systemd.service.cpu - cpu metrics of systemd service
  • systemd.service.mem - mem metrics of systemd service
  • systemd.service.dev - blk io metrics of systemd service
  • systemd.service.net - net metrics of systemd service
  • systemd.service.log - log monitoring of systemd service

Images

Docker container CPU graph in Zabbix: Docker container CPU graph in Zabbix Docker container memory graph in Zabbix: Docker container memory graph in Zabbix Docker container state graph in Zabbix: Docker container state graph in Zabbix

Additional Docker permissions

You have two options, how to get additional Docker permissions:

  • Add zabbix user to docker group (recommended option):
usermod -aG docker zabbix
  • Or edit zabbix_agentd.conf and set AllowRoot (Zabbix agent with root permissions):
AllowRoot=1

Note: If you use Docker from RHEL/Centos repositories, then you have to use AllowRoot=1 option.

Installation

Compilation

You have to compile module, if provided binary doesn't work on your system. Basic compilation steps:

# RHEL/CentOS tools: yum -y wget install git subversion automake autoconf gcc make
# Debian tools: apt-get -y wget install git subversion automake autoconf gcc make pkg-config
cd ~
mkdir zabbix24
cd zabbix24
svn co svn://svn.zabbix.com/branches/2.4 .
./bootstrap.sh
./configure --enable-agent
mkdir src/modules/zabbix_module_docker
cd src/modules/zabbix_module_docker
wget https://raw.githubusercontent.com/monitoringartist/Zabbix-Docker-Monitoring/master/src/modules/zabbix_module_docker/zabbix_module_docker.c
wget https://raw.githubusercontent.com/monitoringartist/Zabbix-Docker-Monitoring/master/src/modules/zabbix_module_docker/Makefile
make

Output will be binary file (dynamically linked shared object library) zabbix_module_docker.so, which can be loaded by zabbix agent.

How it works

See https://blog.docker.com/2013/10/gathering-lxc-docker-containers-metrics/ Metrics for containers are read from cgroup file system. Docker API is used for discovering and some keys. However root or docker permissions are required for communication with Docker via unix socket. You can test API also in your command line:

echo -e "GET /containers/json?all=0 HTTP/1.0\r\n" | nc -U /var/run/docker.sock

Module vs. UserParameter script

Module is ~10x quicker, because it's compiled binary code. I've used my project https://github.com/monitoringartist/zabbix-agent-stress-test for performance tests.

Part of config in zabbix_agentd.conf:

UserParameter=xdocker.cpu[*],grep $2 /cgroup/cpuacct/docker/$1/cpuacct.stat | awk '{print $$2}'
LoadModule=zabbix_module_docker.so

Tests:

[root@dev zabbix-agent-stress-test]# ./zabbix-agent-stress-test.py -s 127.0.0.1 -k "xdocker.cpu[d5bf68ec1fb570d8ac3047226397edd8618eed14278ce035c98fbceef02d7730,system]" -t 20
Warning: you are starting more threads, than your system has available CPU cores (4)!
Starting 20 threads, host: 127.0.0.1:10050, key: xdocker.cpu[d5bf68ec1fb570d8ac3047226397edd8618eed14278ce035c98fbceef02d7730,system]
Success: 291    Errors: 0       Avg speed: 279.68 qps   Execution time: 1.00 sec
Success: 548    Errors: 0       Avg speed: 349.04 qps   Execution time: 2.00 sec
Success: 803    Errors: 0       Avg speed: 282.72 qps   Execution time: 3.00 sec
Success: 1060   Errors: 0       Avg speed: 209.31 qps   Execution time: 4.00 sec
Success: 1310   Errors: 0       Avg speed: 187.14 qps   Execution time: 5.00 sec
Success: 1570   Errors: 0       Avg speed: 178.80 qps   Execution time: 6.01 sec
Success: 1838   Errors: 0       Avg speed: 189.36 qps   Execution time: 7.01 sec
Success: 2106   Errors: 0       Avg speed: 225.68 qps   Execution time: 8.01 sec
Success: 2382   Errors: 0       Avg speed: 344.51 qps   Execution time: 9.01 sec
Success: 2638   Errors: 0       Avg speed: 327.88 qps   Execution time: 10.01 sec
Success: 2905   Errors: 0       Avg speed: 349.93 qps   Execution time: 11.01 sec
Success: 3181   Errors: 0       Avg speed: 352.23 qps   Execution time: 12.01 sec
Success: 3450   Errors: 0       Avg speed: 239.38 qps   Execution time: 13.01 sec
Success: 3678   Errors: 0       Avg speed: 209.88 qps   Execution time: 14.02 sec
Success: 3923   Errors: 0       Avg speed: 180.30 qps   Execution time: 15.02 sec
Success: 4178   Errors: 0       Avg speed: 201.58 qps   Execution time: 16.02 sec
Success: 4434   Errors: 0       Avg speed: 191.92 qps   Execution time: 17.02 sec
Success: 4696   Errors: 0       Avg speed: 332.06 qps   Execution time: 18.02 sec
Success: 4968   Errors: 0       Avg speed: 325.55 qps   Execution time: 19.02 sec
Success: 5237   Errors: 0       Avg speed: 325.61 qps   Execution time: 20.02 sec
^C
Success: 5358   Errors: 0       Avg rate: 192.56 qps    Execution time: 20.53 sec
Avg rate based on total execution time and success connections: 261.02 qps

[root@dev zabbix-agent-stress-test]# ./zabbix-agent-stress-test.py -s 127.0.0.1 -k "docker.cpu[d5bf68ec1fb570d8ac3047226397edd8618eed14278ce035c98fbceef02d7730,system]" -t 20
Warning: you are starting more threads, than your system has available CPU cores (4)!
Starting 20 threads, host: 127.0.0.1:10050, key: docker.cpu[d5bf68ec1fb570d8ac3047226397edd8618eed14278ce035c98fbceef02d7730,system]
Success: 2828   Errors: 0       Avg speed: 2943.98 qps  Execution time: 1.00 sec
Success: 5095   Errors: 0       Avg speed: 1975.77 qps  Execution time: 2.01 sec
Success: 7623   Errors: 0       Avg speed: 2574.55 qps  Execution time: 3.01 sec
Success: 10098  Errors: 0       Avg speed: 4720.20 qps  Execution time: 4.02 sec
Success: 12566  Errors: 0       Avg speed: 3423.56 qps  Execution time: 5.02 sec
Success: 14706  Errors: 0       Avg speed: 2397.01 qps  Execution time: 6.03 sec
Success: 17128  Errors: 0       Avg speed: 903.63 qps   Execution time: 7.05 sec
Success: 19520  Errors: 0       Avg speed: 2663.53 qps  Execution time: 8.05 sec
Success: 21899  Errors: 0       Avg speed: 1516.36 qps  Execution time: 9.07 sec
Success: 24219  Errors: 0       Avg speed: 3570.47 qps  Execution time: 10.07 sec
Success: 26676  Errors: 0       Avg speed: 1204.58 qps  Execution time: 11.08 sec
Success: 29162  Errors: 0       Avg speed: 2719.87 qps  Execution time: 12.08 sec
Success: 31671  Errors: 0       Avg speed: 2265.67 qps  Execution time: 13.08 sec
Success: 34186  Errors: 0       Avg speed: 3490.64 qps  Execution time: 14.08 sec
Success: 36749  Errors: 0       Avg speed: 2094.59 qps  Execution time: 15.09 sec
Success: 39047  Errors: 0       Avg speed: 3213.35 qps  Execution time: 16.09 sec
Success: 41361  Errors: 0       Avg speed: 3171.67 qps  Execution time: 17.09 sec
Success: 43739  Errors: 0       Avg speed: 3946.53 qps  Execution time: 18.09 sec
Success: 46100  Errors: 0       Avg speed: 1308.88 qps  Execution time: 19.09 sec
Success: 48556  Errors: 0       Avg speed: 2663.52 qps  Execution time: 20.09 sec
^C
Success: 49684  Errors: 0       Avg rate: 2673.85 qps   Execution time: 20.52 sec
Avg rate based on total execution time and success connections: 2420.70 qps

Results of 20s stress test:

StartAgent value Module qps UserParameter script qps
3 2420.70 261.02
10 2612.20 332.62
20 2487.93 348.52

Discovery test:

Part of config in zabbix_agentd.conf:

UserParameter=xdocker.discovery,/etc/zabbix/scripts/container_discover.sh
LoadModule=zabbix_module_docker.so

[container_discover.sh] (https://github.com/bsmile/zabbix-docker-lld/blob/master/usr/lib/zabbix/script/container_discover.sh):

Test with 237 running containers:

[root@dev ~]# docker info
Containers: 237
Images: 121
Storage Driver: btrfs
Execution Driver: native-0.2
Kernel Version: 3.10.0-229.el7.x86_64
Operating System: Red Hat Enterprise Linux Server 7.1 (Maipo)
CPUs: 10
Total Memory: 62.76 GiB
Name: dev.local
ID: AOAM:BO3G:5MCE:5FMM:IWKP:NPM4:PRKV:ZZ34:BYFL:XGAV:SRNJ:LKDH
Username: username
Registry: [https://index.docker.io/v1/]
[root@dev ~]# time zabbix_get -s 127.0.0.1 -k docker.discovery > /dev/null

real    0m0.112s
user    0m0.000s
sys     0m0.003s
[root@dev ~]# time zabbix_get -s 127.0.0.1 -k xdocker.discovery > /dev/null

real    0m5.856s
user    0m0.000s
sys     0m0.002s
[root@dev ~]# ./zabbix-agent-stress-test.py -s 127.0.0.1 -k xdocker.discovery
Starting 1 threads, host: 127.0.0.1:10050, key: xdocker.discovery
Success: 0      Errors: 0       Avg rate: 0.00 qps      Execution time: 1.00 sec
Success: 0      Errors: 0       Avg rate: 0.00 qps      Execution time: 2.00 sec
Success: 0      Errors: 0       Avg rate: 0.00 qps      Execution time: 3.02 sec
Success: 0      Errors: 0       Avg rate: 0.00 qps      Execution time: 4.02 sec
Success: 0      Errors: 0       Avg rate: 0.00 qps      Execution time: 5.02 sec
Success: 1      Errors: 0       Avg rate: 0.10 qps      Execution time: 6.02 sec
Success: 1      Errors: 0       Avg rate: 0.10 qps      Execution time: 7.02 sec
Success: 1      Errors: 0       Avg rate: 0.10 qps      Execution time: 8.02 sec
Success: 1      Errors: 0       Avg rate: 0.10 qps      Execution time: 9.02 sec
Success: 1      Errors: 0       Avg rate: 0.10 qps      Execution time: 10.02 sec
Success: 2      Errors: 0       Avg rate: 0.14 qps      Execution time: 11.02 sec
Success: 2      Errors: 0       Avg rate: 0.14 qps      Execution time: 12.03 sec
Success: 2      Errors: 0       Avg rate: 0.14 qps      Execution time: 13.03 sec
Success: 2      Errors: 0       Avg rate: 0.14 qps      Execution time: 14.03 sec
Success: 2      Errors: 0       Avg rate: 0.14 qps      Execution time: 15.03 sec
Success: 3      Errors: 0       Avg rate: 0.16 qps      Execution time: 16.03 sec
Success: 3      Errors: 0       Avg rate: 0.16 qps      Execution time: 17.03 sec
Success: 3      Errors: 0       Avg rate: 0.16 qps      Execution time: 18.03 sec
Success: 3      Errors: 0       Avg rate: 0.16 qps      Execution time: 19.03 sec
Success: 3      Errors: 0       Avg rate: 0.16 qps      Execution time: 20.03 sec
Success: 3      Errors: 0       Avg rate: 0.16 qps      Execution time: 21.04 sec
Success: 4      Errors: 0       Avg rate: 0.17 qps      Execution time: 22.04 sec
Success: 4      Errors: 0       Avg rate: 0.17 qps      Execution time: 23.04 sec
Success: 4      Errors: 0       Avg rate: 0.17 qps      Execution time: 24.04 sec
Success: 4      Errors: 0       Avg rate: 0.17 qps      Execution time: 25.05 sec
Success: 5      Errors: 0       Avg rate: 0.20 qps      Execution time: 26.05 sec
Success: 5      Errors: 0       Avg rate: 0.20 qps      Execution time: 27.05 sec
Success: 5      Errors: 0       Avg rate: 0.20 qps      Execution time: 28.05 sec
Success: 5      Errors: 0       Avg rate: 0.20 qps      Execution time: 29.05 sec
Success: 5      Errors: 0       Avg rate: 0.20 qps      Execution time: 30.05 sec
Success: 5      Errors: 0       Avg rate: 0.20 qps      Execution time: 31.05 sec
^C
Success: 5      Errors: 0       Avg rate: 0.20 qps      Execution time: 31.35 sec
Avg rate based on total execution time and success connections: 0.16 qps
[root@dev ~]# ./zabbix-agent-stress-test.py -s 127.0.0.1 -k docker.discovery
Starting 1 threads, host: 127.0.0.1:10050, key: docker.discovery
Success: 5      Errors: 0       Avg rate: 6.26 qps      Execution time: 1.00 sec
Success: 5      Errors: 0       Avg rate: 6.26 qps      Execution time: 2.00 sec
Success: 12     Errors: 0       Avg rate: 7.45 qps      Execution time: 3.00 sec
Success: 20     Errors: 0       Avg rate: 6.77 qps      Execution time: 4.00 sec
Success: 28     Errors: 0       Avg rate: 7.82 qps      Execution time: 5.00 sec
Success: 36     Errors: 0       Avg rate: 7.21 qps      Execution time: 6.01 sec
Success: 43     Errors: 0       Avg rate: 10.22 qps     Execution time: 7.01 sec
Success: 43     Errors: 0       Avg rate: 10.22 qps     Execution time: 8.01 sec
Success: 50     Errors: 0       Avg rate: 6.79 qps      Execution time: 9.01 sec
Success: 57     Errors: 0       Avg rate: 6.11 qps      Execution time: 10.01 sec
Success: 66     Errors: 0       Avg rate: 8.50 qps      Execution time: 11.01 sec
Success: 73     Errors: 0       Avg rate: 6.51 qps      Execution time: 12.01 sec
Success: 81     Errors: 0       Avg rate: 7.18 qps      Execution time: 13.01 sec
Success: 82     Errors: 0       Avg rate: 7.85 qps      Execution time: 14.01 sec
Success: 87     Errors: 0       Avg rate: 6.54 qps      Execution time: 15.02 sec
Success: 95     Errors: 0       Avg rate: 7.84 qps      Execution time: 16.02 sec
Success: 103    Errors: 0       Avg rate: 9.24 qps      Execution time: 17.02 sec
Success: 111    Errors: 0       Avg rate: 9.94 qps      Execution time: 18.02 sec
Success: 119    Errors: 0       Avg rate: 7.63 qps      Execution time: 19.02 sec
Success: 120    Errors: 0       Avg rate: 6.70 qps      Execution time: 20.12 sec
Success: 121    Errors: 0       Avg rate: 3.61 qps      Execution time: 21.12 sec
Success: 128    Errors: 0       Avg rate: 8.46 qps      Execution time: 22.12 sec
Success: 136    Errors: 0       Avg rate: 7.63 qps      Execution time: 23.12 sec
Success: 144    Errors: 0       Avg rate: 6.21 qps      Execution time: 24.12 sec
Success: 150    Errors: 0       Avg rate: 6.89 qps      Execution time: 25.12 sec
Success: 157    Errors: 0       Avg rate: 10.87 qps     Execution time: 26.18 sec
Success: 160    Errors: 0       Avg rate: 7.52 qps      Execution time: 27.18 sec
Success: 168    Errors: 0       Avg rate: 9.81 qps      Execution time: 28.18 sec
Success: 174    Errors: 0       Avg rate: 6.69 qps      Execution time: 29.18 sec
Success: 181    Errors: 0       Avg rate: 6.35 qps      Execution time: 30.18 sec
Success: 188    Errors: 0       Avg rate: 7.64 qps      Execution time: 31.19 sec
^C
Success: 193    Errors: 0       Avg rate: 8.83 qps      Execution time: 31.79 sec
Avg rate based on total execution time and success connections: 6.07 qps

Troubleshooting

Edit your zabbix_agentd.conf and set DebugLevel:

DebugLevel=4

Debug messages from this module will be available in standard zabbix_agentd.log.

Issues and feature requests

Please use Github issue tracker.

Recommended docs

Author

Devops Monitoring zExpert, who loves monitoring systems, which start with letter Z. Those are Zabbix and Zenoss.

Professional monitoring services:

[Monitoring Artist] (http://www.monitoringartist.com)

About

🐳 Monitoring of Docker containers (LXC/systemd Docker supported) - Zabbix template and Zabbix C module

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C 99.9%
  • Makefile 0.1%