Skip to content

equelin/vsanmetrics

Repository files navigation

vsanmetrics

vsanmetrics is a tool written in Python for collecting usage and performance metrics and health status from a VMware vSAN cluster and translating them in InfluxDB's line protocol.

It can be useful to send metrics in a time-serie database like InfluxDB or Graphite with the help of Telegraf and then display metrics in Grafana.

A detailed list of all entities types and metrics is available here

Prerequisites

  • Python 3 (This script has been tested with python 3.6.7)
  • Pyvmomi python's librairy

You can install the librairies with pip -> pip install -r requirements.txt

To use the vSAN Python bindings, download the SDK and place vsanmgmtObjects.py and vsanapiutis.py on a path where your Python applications can import library or in the same folder than vsanmetrics.py.

Installation

  • Download the script vsanmetrics.py
  • On linux box, make the script executable
% chmod +x ./vsanmetrics
  • Run the script with the -h parameter to check if it works
% ./vsanmetrics -h

usage: vsanmetrics.py [-h] -s VCENTER [-o PORT] -u USER [-p PASSWORD] -c
                      CLUSTERNAME [--performance] [--capacity] [--health]
                      [--skipentitytypes SKIPENTITYTYPES]

Export vSAN cluster performance and storage usage statistics to InfluxDB line
protocol

optional arguments:
  -h, --help            show this help message and exit
  -s VCENTER, --vcenter VCENTER
                        Remote vcenter to connect to
  -o PORT, --port PORT  Port to connect on
  -u USER, --user USER  User name to use when connecting to vcenter
  -p PASSWORD, --password PASSWORD
                        Password to use when connecting to vcenter
  -c CLUSTERNAME, --cluster_name CLUSTERNAME
                        Cluster Name
  --performance         Output performance metrics
  --capacity            Output storage usage metrics
  --health              Output cluster health status
  --skipentitytypes SKIPENTITYTYPES
                        List of entity types to skip. Separated by a comma
  --cachefolder CACHEFOLDER
                        Folder where the cache files are stored
  --cacheTTL CACHETTL   TTL of the object inventory cache

Usage

Run the script against a vSAN cluster to gather the storage usage statistics.

% ./vsanmetrics.py -s vcenter.example.com -u administrator@vsphere.local -p MyAwesomePassword -c VSAN-CLUSTER --capacity

capacity_global,scope=global,vcenter=vcenter.example.com,cluster=VSAN-CLUSTER totalCapacityB=7200999211008,freeCapacityB=1683354550260 1525422314084382976
capacity_summary,scope=summary,vcenter=vcenter.example.com,cluster=VSAN-CLUSTER temporaryOverheadB=0,physicalUsedB=2636212338688,primaryCapacityB=2688980877312,usedB=5380734189568,reservedCapacityB=3607749040540,overReservedB=2744521850880,provisionCapacityB=6986210377728,overheadB=2828663783436 1525422314084382976
capacity_vmswap,scope=vmswap,vcenter=vcenter.example.com,cluster=VSAN-CLUSTER temporaryOverheadB=0,physicalUsedB=8422162432,primaryCapacityB=177330978816,usedB=355240771584,reservedCapacityB=355089776640,overReservedB=346818609152,overheadB=177909792768 1525422314084382976
capacity_checksumOverhead,scope=checksumOverhead,vcenter=vcenter.example.com,cluster=VSAN-CLUSTER temporaryOverheadB=0,physicalUsedB=0,primaryCapacityB=0,usedB=8858370048,reservedCapacityB=0,overReservedB=0,overheadB=8858370048 1525422314084382976

Run the script against a vSAN cluster to gather performance statistics.

% ./vsanmetrics.py -s vcenter.example.com -u administrator@vsphere.local -p MyAwesomePassword -c VSAN-CLUSTER --performance

cluster-domclient,cluster=VSAN-CLUSTER,vcenter=vcenter.example.com,uuid=52b29fa6-9cb9-6d67-31ed-4bf8f2dd9294 oio=7.0,throughputRead=40883.0,latencyAvgWrite=11218.0,latencyAvgRead=985.0,iopsRead=1.0,throughputWrite=2819.0,congestion=0.0,iopsWrite=0.0 1525462200000000000
cluster-domcompmgr,cluster=VSAN-CLUSTER,vcenter=vcenter.example.com,uuid=52b29fa6-9cb9-6d67-31ed-4bf8f2dd9294 oio=6.0,throughputRecWrite=0.0,latencyAvgRecWrite=0.0,throughputRead=45309.0,latencyAvgWrite=1335.0,tputResyncRead=0.0,latencyAvgRead=935.0,iopsRead=1.0,throughputWrite=14476.0,latAvgResyncRead=0.0,iopsResyncRead=0.0,iopsRecWrite=0.0,iopsWrite=2.0,congestion=0.0 1525462200000000000
host-domclient,cluster=VSAN-CLUSTER,vcenter=vcenter.example.com,hostname=esx01.example.com,uuid=5ae60a2b-fe13-25dd-1f19-005056a3a442 oio=1.0,throughputRead=95.0,latencyAvgWrite=0.0,latencyAvgRead=340.0,iopsRead=0.0,clientCacheHitRate=0.0,throughputWrite=0.0,congestion=0.0,iopsWrite=0.0,clientCacheHits=0.0 1525462200000000000
host-domclient,cluster=VSAN-CLUSTER,vcenter=vcenter.example.com,hostname=esx03.example.com,uuid=5ae750e2-bc6d-487b-1283-005056a38be2 oio=6.0,throughputRead=40788.0,latencyAvgWrite=11218.0,latencyAvgRead=1000.0,iopsRead=1.0,clientCacheHitRate=0.0,throughputWrite=2819.0,congestion=0.0,iopsWrite=0.0,clientCacheHits=0.0 1525462200000000000
host-domclient,cluster=VSAN-CLUSTER,vcenter=vcenter.example.com,hostname=esx02.example.com,uuid=5ae7229f-771d-1091-ffe7-005056a35f01 oio=0.0,throughputRead=0.0,latencyAvgWrite=0.0,latencyAvgRead=0.0,iopsRead=0.0,clientCacheHitRate=0.0,throughputWrite=0.0,congestion=0.0,iopsWrite=0.0,clientCacheHits=0.0 1525462200000000000

Run the script against a vSAN cluster to gather performance statistics and skip some entity types like virtual machines or VSCSI entities:

% ./vsanmetrics.py -s vcenter.example.com -u administrator@vsphere.local -p MyAwesomePassword -c VSAN-CLUSTER --performance --skipentitytypes virtual-machine,vscsi

cluster-domclient,cluster=VSAN-CLUSTER,vcenter=vcenter.example.com,uuid=52b29fa6-9cb9-6d67-31ed-4bf8f2dd9294 oio=7.0,throughputRead=40883.0,latencyAvgWrite=11218.0,latencyAvgRead=985.0,iopsRead=1.0,throughputWrite=2819.0,congestion=0.0,iopsWrite=0.0 1525462200000000000
cluster-domcompmgr,cluster=VSAN-CLUSTER,vcenter=vcenter.example.com,uuid=52b29fa6-9cb9-6d67-31ed-4bf8f2dd9294 oio=6.0,throughputRecWrite=0.0,latencyAvgRecWrite=0.0,throughputRead=45309.0,latencyAvgWrite=1335.0,tputResyncRead=0.0,latencyAvgRead=935.0,iopsRead=1.0,throughputWrite=14476.0,latAvgResyncRead=0.0,iopsResyncRead=0.0,iopsRecWrite=0.0,iopsWrite=2.0,congestion=0.0 1525462200000000000
host-domclient,cluster=VSAN-CLUSTER,vcenter=vcenter.example.com,hostname=esx01.example.com,uuid=5ae60a2b-fe13-25dd-1f19-005056a3a442 oio=1.0,throughputRead=95.0,latencyAvgWrite=0.0,latencyAvgRead=340.0,iopsRead=0.0,clientCacheHitRate=0.0,throughputWrite=0.0,congestion=0.0,iopsWrite=0.0,clientCacheHits=0.0 1525462200000000000
host-domclient,cluster=VSAN-CLUSTER,vcenter=vcenter.example.com,hostname=esx03.example.com,uuid=5ae750e2-bc6d-487b-1283-005056a38be2 oio=6.0,throughputRead=40788.0,latencyAvgWrite=11218.0,latencyAvgRead=1000.0,iopsRead=1.0,clientCacheHitRate=0.0,throughputWrite=2819.0,congestion=0.0,iopsWrite=0.0,clientCacheHits=0.0 1525462200000000000
host-domclient,cluster=VSAN-CLUSTER,vcenter=vcenter.example.com,hostname=esx02.example.com,uuid=5ae7229f-771d-1091-ffe7-005056a35f01 oio=0.0,throughputRead=0.0,latencyAvgWrite=0.0,latencyAvgRead=0.0,iopsRead=0.0,clientCacheHitRate=0.0,throughputWrite=0.0,congestion=0.0,iopsWrite=0.0,clientCacheHits=0.0 1525462200000000000

Cache

The script will try to maintain an inventory of the vSAN infrastructure in a cache. There are two major benefits:

  • Reducing the global execution time of the script for larger environnement
  • Avoid errors when a host is disconnected wilhe the script is executing

By default cache validity duration is 60 minutes. You can choose your own duration with the parameter --cacheTTL. Cache files are stored where the script is executed, you can modify this behavior with parameter --cachefolder.

% ./vsanmetrics.py -s vcenter.example.com -u administrator@vsphere.local -p MyAwesomePassword -c VSAN-CLUSTER --performance --cacheTTL 300 --cachefolder /tmp

List of available entities types

A more detailed list of entities and metrics is available here

Name Description
cluster-domclient Metrics about clusters in the perspective of VM consumption.
cluster-domcompmgr Metrics about clusters in the perspective of vSAN backend.
host-domclient Metrics about hosts in the perspective of VM consumption
host-domcompmgr Metrics about hosts in the perspective of vSAN backend.
cache-disk Metrics about Cache-tier disks
capacity-disk Metrics about Capacity-tier disks
disk-group Metrics about disk groups.
vscsi Metrics for Virtual SCSI of virtual machines
virtual-machine Metrics for virtual machines
virtual-disk Metrics for virtual disks.
vsan-vnic-net Metrics for vSAN VMkernel Network Adapter.
vsan-host-net Metrics for vSAN Host Network.
vsan-pnic-net Metrics for vSAN physical NIC.
vsan-iscsi-host Metrics for all vSAN iSCSI targets on this ESXi host.
vsan-iscsi-target Metrics for all LUNs on a vSAN iSCSI target.
vsan-iscsi-lun Metrics for a vSAN iSCSI LUN.

Using vsanmetrics with Telegraf

The exec input plugin of Telegraf executes the commands on every interval and parses metrics from their output in any one of the accepted Input Data Formats.

Don't forget to configure Telegraf to output data to a time series database !

vsanmetrics output the metrics in InfluxDB's line protocol. Telegraf will parse them and send them to any data format configured in the outputs plugins.

vsanmetrics and and the Python's librairies should be available by the user who run the Telegraf service. (typically root on Linux boxes...).

TIP: On Linux, install the librairies with the command sudo -H pip install -r requirements.txt to make it available to the root user.

Here is an example of a working telegraf's config file:

###############################################################################
#                            INPUT PLUGINS                                    #
###############################################################################

[[inputs.exec]]
  # Shell/commands array
  # Full command line to executable with parameters, or a glob pattern to run all matching files.
  commands = ["/path/to/script/vsanmetrics.py -s vcenter01.example.com -u administrator@vsphere.local -p MyAwesomePassword -c VSAN-CLUSTER --performance --capacity --health"]

  # Timeout for each command to complete.
  timeout = "60s"

  # Data format to consume.
  # NOTE json only reads numerical measurements, strings and booleans are ignored.
  data_format = "influx"

  interval = "300s"

If needed, you can specify more than one input plugin. It might be useful if you want to gather different statistics with different intervals or if you want to query different vSAN clusters.

###############################################################################
#                            INPUT PLUGINS                                    #
###############################################################################

[[inputs.exec]]
  # Shell/commands array
  # Full command line to executable with parameters, or a glob pattern to run all matching files.
  commands = ["/path/to/script/vsanmetrics.py -s vcenter01.example.com -u administrator@vsphere.local -p MyAwesomePassword -c VSAN-CLUSTER --performance --capacity --health"]

  # Timeout for each command to complete.
  timeout = "60s"

  # Data format to consume.
  # NOTE json only reads numerical measurements, strings and booleans are ignored.
  data_format = "influx"

  interval = "300s"

[[inputs.exec]]
  # Shell/commands array
  # Full command line to executable with parameters, or a glob pattern to run all matching files.
  commands = ["/path/to/script/vsanmetrics.py -s vcenter02.example.com -u administrator@vsphere.local -p MyAwesomePassword -c VSAN-CLUSTER --performance --capacity --health"]

  # Timeout for each command to complete.
  timeout = "60s"

  # Data format to consume.
  # NOTE json only reads numerical measurements, strings and booleans are ignored.
  data_format = "influx"

  interval = "300s"

Author

Erwan Quélin

License

Copyright 2018 Erwan Quelin and the community.

Licensed under the Apache License 2.0.