Skip to content

Latest commit

 

History

History
592 lines (433 loc) · 39.7 KB

README.md

File metadata and controls

592 lines (433 loc) · 39.7 KB

Ansible Prometheus

Kitchen tests Validate Ansible awesome_bot tests Latest tag Ansible Galaxy MIT License

Installs and manages Prometheus server, Alertmanager, PushGateway, and numerous Prometheus exporters

This role was designed to allow adding new exporters with ease. Regular releases ensure it always provides the latest Prometheus software.

This role can register client exporters with the Prometheus server/s automatically (see tgroup management below).

Requirements

  • Ansible >= 2.8.0
  • Facts must be gathered (gather_facts: true)

Supported Software and Operating Systems

Supported Operating Systems, Distributions, and Architectures

This module is intended to support as many distributions and architectures as possible. The following table specifies which combinations are currently tested. Most exporters will also work on ARM architectures:

OS Release Architectures
Alpine 3.2 through 3.11, edge x86_64 (amd64)
AmazonLinux 1 and 2 x86_64 (amd64)
ArchLinux Current x86_64 (amd64)
Enterprise Linux 6, 7, 8 x86_64 (amd64)
Fedora 20 through 31, rawhide x86_64 (amd64)
Gentoo (openrc) Current x86_64 (amd64)
Gentoo (systemd) Current x86_64 (amd64)
OpenSUSE 13.1 through tumbleweed x86_64 (amd64)
Oracle Linux 6, 7, 8 x86_64 (amd64)
Ubuntu 16.04 through 20.04 x86_64 (amd64)

Managed Prometheus software

The following core Prometheus software is supported in addition to the list of exporters below. This software is fully tested on all supported OS, distributions, and architectures.

Prometheus software Usage Author CI tested
prometheus usage prometheus Yes
alertmanager usage prometheus Yes
push_gateway usage prometheus Yes

Managed exporters

All exporters are verified to install. Currently select modules receive testing via CI (Continuous Integration) and Inspec

See each exporter's usage page for more details:

Exporter Usage Author CI tested
389ds_exporter_terrycain usage terrycain Yes
apache_exporter_lusitaniae usage Lusitaniae Yes
aerospike_exporter_alicebob usage alicebob Yes
bigip_exporter_expressenab usage ExpressenAB Yes
bind_exporter_prometheus_community usage prometheus-community Partial
blackbox_exporter usage prometheus Yes
ceph_exporter_digitalocean usage digitalocean Partial
clickhouse_exporter_clickhouse usage clickhouse Yes
cloudwatch_exporter usage prometheus Partial
collectd_exporter usage prometheus Yes
consul_exporter usage prometheus Yes
couchbase_exporter_blakelead usage leansys-team Yes
couchdb_exporter_gesellix usage gesellix Yes
digitalocean_exporter_metalmatze usage metalmatze Yes
elasticsearch_exporter_prometheus_community usage prometheus_community Yes
fping_exporter_schweikert usage schweikert Yes
gluster_exporter_ofesseler usage ofesseler Yes
graphite_exporter usage prometheus Yes
grok_exporter_fstab usage fstab Yes
haproxy_exporter usage prometheus Yes
influxdb_exporter usage prometheus Yes
ipmi_exporter_prometheus_community usage prometheus-community Yes
iperf3_exporter_edgard usage edgard Yes
iptables_exporter_retailnext usage retailnext Yes
jmx_exporter usage prometheus No
kafka_exporter_danielqsj usage danielqsj Partial
keepalived_exporter_gen2brain usage gen2brain Yes
memcached_exporter usage prometheus Yes
mongodb_exporter_percona usage percona Yes
mysqld_exporter usage prometheus Partial
nginx_exporter_nginxinc usage nginxinc Partial
node_exporter usage prometheus Yes
ntp_exporter_sapcc usage sapcc Yes
nvidia_exporter_bugroger usage BugRoger Partial
nvidia_gpu_exporter_mindprince usage mindprince Partial
openldap_exporter_tomcz usage tomcz Yes
openvpn_exporter_kumina usage kumina Partial
phpfpm_exporter_hipages usage hipages Yes
ping_exporter_czerwonk usage czerwonk Yes
postgres_exporter_prometheus_community usage prometheus-community Yes
process_exporter_ncabatoff usage ncabatoff Yes
proxysql_exporter_percona usage percona Yes
rabbitmq_exporter_kbudde usage kbudde Yes
redis_exporter_oliver006 usage oliver006 Yes
script_exporter_adhocteam usage adhocteam Yes
smokeping_exporter_superq usage SuperQ Yes
snmp_exporter usage prometheus Yes
sql_exporter_free usage free Yes
squid_exporter_boynux usage boynux Yes
ssl_exporter_ribbybibby usage ribbybibby Yes
statsd_exporter usage prometheus Yes
wireguard_exporter_mdlayher usage mdlayher Partial

Managed node_exporter textfiles scripts

Numerous node_exporter textfiles scripts are supported and can be installed via the following variables. These scripts are installed under '/opt/prometheus/scripts' by default:

node_exporter textfiles script Source Enable variable
apt.sh node_exporter examples prometheus_script_apt: true
btrfs_stats.py node_exporter examples prometheus_script_btrfs_stats: true
deleted_libraries.py node_exporter examples prometheus_script_deleted_libraries: true
directory-size.sh node_exporter examples prometheus_script_directory_size: true
inotify-instances node_exporter examples prometheus_script_inotify_instances: true
ipmitool node_exporter examples prometheus_script_ipmitool: true
lvm-prom-collector node_exporter examples prometheus_script_lvm_prom_collector: true
md_info.sh node_exporter examples prometheus_script_md_info: true
md_info_detail.sh node_exporter examples prometheus_script_md_info_detail: true
mellanox_hca_temp node_exporter examples prometheus_script_mellanox_hca_temp: true
multipathd_info node_exporter examples prometheus_script_multipathd_info: true
ntpd_metrics.py node_exporter examples prometheus_script_ntpd_metrics: true
nvme_metrics.sh node_exporter examples prometheus_script_nvme_metrics: true
pacman.sh node_exporter examples prometheus_script_pacman: true
promcron.sh mesaguy/ansible-prometheus prometheus_script_promcron: true
promrun.sh mesaguy/ansible-prometheus prometheus_script_promrun: true
smartmon.py node_exporter examples prometheus_script_smartmon_python: true
smartmon.sh node_exporter examples prometheus_script_smartmon: true
sssd_check.sh mesaguy/ansible-prometheus prometheus_script_sssd_check: true
storcli.py node_exporter examples prometheus_script_storcli: true
tw_cli.py node_exporter examples prometheus_script_tw_cli: true
yum.sh node_exporter examples prometheus_script_yum: true

Role Variables

A 'prometheus_components' array variable is used to specify the Prometheus software to install. This example installs all supported prometheus_components:

# Demonstration only. Clients should only have applicable software and exporters defined:
prometheus_components:
 # Core components:
 - alertmanager
 - prometheus
 - push_gateway
 # Exporters
 - 389ds_exporter_terrycain
 - apache_exporter_lusitaniae
 - aerospike_exporter_alicebob
 - bigip_exporter_expressenab
 - bind_exporter_prometheus_community
 - blackbox_exporter
 - ceph_exporter_digitalocean
 - clickhouse_exporter_clickhouse
 - cloudwatch_exporter
 - collectd_exporter
 - consul_exporter
 - couchbase_exporter_blakelead
 - couchdb_exporter_gesellix
 - digitalocean_exporter_metalmatze
 - elasticsearch_exporter_prometheus_community
 - fping_exporter_schweikert
 - gluster exporter_ofesseler
 - graphite_exporter
 - grok_exporter_fstab
 - haproxy_exporter
 - influxdb_exporter
 - iperf3_exporter_edgard
 - ipmi_exporter_prometheus_community
 - iptables_exporter_retailnext
 - jmx_exporter
 - kafka_exporter_danielqsj
 - keepalived_exporter_gen2brain
 - memcached_exporter
 - mysqld_exporter
 - nginx_exporter_nginxinc
 - node_exporter
 - ntp_exporter_sapcc
 - nvidia_exporter_bugroger
 - nvidia_gpu_exporter_mindprince
 - openldap_exporter_tomcz
 - openvpn_exporter_kumina
 - phpfpm_exporter_hipages
 - ping_exporter_czerwonk
 - postgres_exporter_prometheus_community
 - process_exporter_ncabatoff
 - proxysql_exporter_percona
 - rabbitmq_exporter_kbudde
 - redis_exporter_oliver006
 - script_exporter_adhocteam
 - smokeping_exporter_superq
 - snmp_exporter
 - sql_exporter_free
 - squid_exporter_boynux
 - ssl_exporter_ribbybibby
 - statsd_exporter
 - wireguard_exporter_mdlayher

Mesaguy script documentation

  • promcron for monitoring the execution of cron jobs
  • promrun for monitoring the execution of commands
  • sssd_check for monitoring the status of SSSD

Common variables

By default, if a Prometheus software or exporter binary fails to install, the installation fails. This default can be overridden causing an installation via source by setting the global 'prometheus_fallback_to_build' boolean or a software specific override. For example, to allow the blackbox_exporter to be built from source if no binary can be found set:

prometheus_blackbox_exporter_fallback_to_build: true

All daemon installer tasks have a 'runas' parameter to specify which user the daemon will run as. By default all users run as the 'prometheus_user' (defaults to: prometheus). For example, to have the blackbox_exporter run as user 'test' set the following variable:

prometheus_blackbox_exporter_runas: test

Global variables

Link the Prometheus etc directory to '/etc/prometheus'. The Prometheus etc directory defaults to '/opt/prometheus/etc':

prometheus_link_etc: true

Attempt to force the etc directory symlink referenced above:

prometheus_link_etc_force: false

Install the 'sponge' utility. Recommended by the Prometheus project when writing to node_exporter's textfile directory. The EPEL repository is required if installing on a Red Hat Enterprise Linux derivative. CentOS 8.x requires the 'CentOS-PowerTools' yum repository, OracleLinux 7 requires the 'ol7_optional_archive' repository, and Red Hat Enterprise Linux 8 requires the 'Red Hat CodeReady Linux Builder' yum repository be enabled:

prometheus_install_sponge: false

Purge old and now orphaned versions of software:

prometheus_purge_orphans: false

Purge backups of prometheus configuration files from the prometheus 'etc' directory files after 'prometheus_etc_backup_max_age' days (Default: 31d). Option 'prometheus_etc_purge_backups' defaults to 'false':

prometheus_etc_purge_backups: true
prometheus_etc_backup_max_age: 31d

Root directory to install Prometheus software:

prometheus_root_dir: '/opt/prometheus'

Test each service port after installing and starting each service:

prometheus_test_service_port: true

Manage the 'prometheus' service user and group:

prometheus_manage_group: true
prometheus_manage_user: true

Name of the Prometheus service and group:

prometheus_group: prometheus
prometheus_user: prometheus

Create the Prometheus user and group as system accounts, defaults to 'false':

prometheus_group_is_system: true
prometheus_user_is_system: true

Configure ulimits for 'prometheus' user:

prometheus_configure_ulimits: false
prometheus_ulimit_hard_nofile: 8192
prometheus_ulimit_soft_nofile: 4096

If installing a Prometheus application binary fails, fall back to installing the Prometheus software via source. Installation from source generally requires installing compilers. It is also possible to enable 'fallback_to_build' on a case-by-case basis (ie: prometheus_blackbox_exporter_fallback_to_build: true):

prometheus_fallback_to_build: false

Go version to use when building Prometheus software:

prometheus_go_version: 1.13.10

The Prometheus etc directory, defaults to '/opt/prometheus/etc':

prometheus_etc_dir: "{{ prometheus_root_dir }}/etc"

The root directory in which exporters are installed, defaults to '/opt/prometheus/exporters':

prometheus_exporters_dir: "{{ prometheus_root_dir }}/exporters"

The root directory in which 'go' is installed. Go is only installed if Prometheus software is being installed from source. Defaults to '/opt/prometheus/go':

prometheus_go_dir: "{{ prometheus_root_dir }}/go"

The directory in which logs are created. Systems using journalctl will generally log to journalctl instead of files:

prometheus_log_dir: "/var/log/prometheus"

The directory to use for temporary space, principally when building Prometheus software. Defaults to '/opt/prometheus/tmp':

prometheus_tmp_dir: "{{ prometheus_root_dir }}/tmp"

The directory to use when storing persistent Prometheus data (ie: The Prometheus server's data), defaults to '/opt/prometheus/var':

prometheus_var_dir: "{{ prometheus_root_dir }}/var"

Optionally disable symlink of tool applications (amtool, promtool, etc) to /usr/local/bin. Defaults to 'true':

prometheus_symlink_tools: false

Cache downloaded software on the Ansible host and push cached software to the remote hosts Ansible is configuring. Defaults to disabled via 'false':

prometheus_local_archive: true
prometheus_local_archive_dir: ../archive/prometheus

Prometheus rule management variables

Enable management of Prometheus 'rules':

prometheus_manage_rules: true

Local location to find rules files, defaults to empty (disabled):

prometheus_rules_source_dirs:
 - ../files/prometheus/rules
 - ../files/prometheus/additional_rules

Ownership and permissions of rules files, defaults:

prometheus_rules_dir_mode: 0755
prometheus_rules_file_mode: 0644
prometheus_rules_group: '{{ prometheus_group }}' # prometheus
prometheus_rules_owner: '{{ prometheus_user }}'  # prometheus

Purge backups of rules files after 'prometheus_rules_backup_max_age' days (Default: 90d). Option 'prometheus_rules_purge_backups' defaults to 'false':

prometheus_rules_purge_backups: true
prometheus_rules_backup_max_age: 90d

Purge undefined (orphaned) rules from Prometheus servers. Defaults to 'false':

prometheus_rules_purge_orphans: true

Prometheus log rotation variables

Log rotation is disabled by default, but can be configured simply using the following variables. Log rotation is configured for all .log files in the Prometheus log directory (ie: /var/log/prometheus/.log).

Enable installing a prometheus log rotation script. Defaults to 'false':

prometheus_logrotate: true

Number of log rotation (days) to keep:

prometheus_logrotate_count: 31

Boolean specifying whether logs should be compressed:

prometheus_logrotate_compress: true

Log rotation configuration file directory:

prometheus_logrotate_dir: /etc/logrotate.d

Prometheus client variables

Cause all Prometheus servers defined in a 'prometheus_servers' array/list variable to verify connectivity to each of the client's exporters:

prometheus_software_server_side_connect_test: true

Configure firewalld rules to permit server IPs defined in a 'prometheus_server_ips' array/list variable to connect to each of the client's exporters. This functionality requires that the python 'netaddr' module be installed (ie: yum install -y python-netaddr or dnf install -y python-netaddr or pip install netaddr). Only enable this variable on servers that use firewalld, otherwise the task will fail:

prometheus_manage_client_firewalld: true
# Optionally set:
prometheus_firewalld_zone: public

If firewalld customization is required, one can add firewalld rules using a playbook as follows:

- name: Allow incoming prometheus server connections to node_exporter
  become: true
  firewalld:
    immediate: true
    port: 9100/tcp
    permanent: true
    source: "{{ item }}"
    state: enabled
    zone: public
  with_items: "{{ prometheus_server_ips }}"
  when: uses_firewalld is defined and 'node_exporter' in prometheus_components

Configure iptables rules to permit server IPs defined in a 'prometheus_server_ips' array/list variable to connect to each of the client's exporters. Only enable this variable on servers that use iptables, otherwise the task will fail:

prometheus_manage_client_iptables: true

If iptables_raw has been installed, you can enable the following variable:

prometheus_manage_client_iptables_raw: true

This role can manage your Prometheus server 'target groups' (tgroups) automatically, dynamically creating tgroup files in a specified directory (/opt/prometheus/etc/tgroups by default) for each client exporter.

Automatic tgroup file management can be enabled for client side operation, server side operation, or both. In client mode, client's exporters are registered automatically on the Prometheus server specified in a 'prometheus_servers' array. In server mode, the inventory is parsed to determine which exporters are available on each host and all clients are registered with the server's specified in each client's 'prometheus_servers' array.

By default, client and server tgroups use 'inventory_hostname' (fqdn) and 'inventory_hostname_short' (hostname) values for server fqdn/hostnames and ignore facts. This is done because server-side population of tgroups cannot account for client's facts unless clients are configured to cache their facts. To use fact based 'ansible_fqdn' (fqdn) and 'ansible_hostname' (hostname) variables enable 'prometheus_tgroup_use_facts'. At this time, enabling 'prometheus_tgroup_use_facts' for any clients disables server side tgroup management:

prometheus_tgroup_use_facts: true

To enable automatic tgroup file generation on the client side, you must define 'prometheus_manage_client_tgroups' as true and list your Prometheus servers in a 'prometheus_servers' variable in your Ansible variables or inventory. The following will create tgroup files in /opt/prometheus/etc/ansible_tgroups:

prometheus_manage_client_tgroups: true
prometheus_servers:
 - 'prometheus1'
 - 'prometheus2'
# Optional, defaults to /opt/prometheus/etc/tgroups:
prometheus_managed_tgroup_dir: '/opt/prometheus/etc/ansible_tgroups'

If this role is managing your tgroup files, you can apply labels to your exporter/s using the 'prometheus_tgroup_labels' variable:

- hosts: prometheus_clients
  vars:
    prometheus_components:
      - node_exporter
    prometheus_tgroup_labels:
      environment: development
      site: primary
  roles:
    - mesaguy.prometheus

Using 'set_fact' to do the same:

  • name: Set Prometheus labels for host set_fact: prometheus_tgroup_labels: environment: 'development' site: primary

Exporters that aren't managed by this role can be specified using a 'prometheus_additional_exporters' variable as follows. Any labels specified in 'prometheus_tgroup_labels' will be merged with labels defined in 'prometheus_additional_exporters'. Firewall rules will be created for additional exporters if 'prometheus_manage_client_firewalld' or 'prometheus_manage_client_iptables' is defined.

prometheus_additional_exporters:
 - name: docker
   port: 9323
   labels: {}
 - name: foo
   port: 9999
   labels:
     team: foo
     department: IT

To enable automatic tgroup file generation on the server side, you must define 'prometheus_manage_server_tgroups' as true and list your Prometheus servers in a 'prometheus_servers' variable in your Ansible variables or inventory. The following will create tgroup files in /opt/prometheus/etc/ansible_tgroups for all clients that have 'prometheus_compenents' and/or 'prometheus_additional_exporters', clients must also have 'prometheus_servers' array configured:

prometheus_manage_server_tgroups: true

To only configure server tgroups and perform no role tasks, enable 'prometheus_manage_server_tgroups_only':

- hosts: prometheus_servers
  vars:
    prometheus_manage_server_tgroups_only: true
  roles:
    - mesaguy.prometheus

Purge undefined (orphaned) exporters. When run in client mode, this option only effects client's orphaned files. When run in server mode this affects all tgroup files:

prometheus_tgroup_dir_purge_orphans: true

Specify a FQDN for a host when the FQDN isn't in Ansible's inventory and isn't the host's official FQDN. This option should be generally avoided, fixing DNS or Ansible's inventory is a better option:

prometheus_override_fqdn: weird-hostname.example.org

Prometheus server configuration

To enable prometheus server include role task: prometheus

Prometheus configuration files are validated using 'promtool' before Prometheus is restarted.

The configuration content. The example below utilizes a file named 'prometheus_server.yml' in your Ansible root directory's 'files' directory. If no configuration content is defined, a default configuration file is utilized. You will want to customize your configuration file content!:

prometheus_server_cfg: '{{ lookup("file", "../files/prometheus_server.yml") | from_yaml }}'

Or embedding the YAML directly into your playbook:

  prometheus_server_cfg:
    global:
      scrape_interval: 15s

      # Attach these labels to any time series or alerts when communicating with
      # external systems (federation, remote storage, Alertmanager).
      external_labels:
        monitor: 'codelab-monitor'

    # A scrape configuration containing exactly one endpoint to scrape:
    # Here it's Prometheus itself.
    scrape_configs:
      # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
      - job_name: 'prometheus'

        # Override the global default and scrape targets from this job every 5 seconds.
        scrape_interval: 5s

        static_configs:
          - targets: ['localhost:9090']

An array of additional flags to pass to the prometheus daemon:

prometheus_extra_opts: []

The version of Prometheus to install. The default version can be found in the prometheus variables file and the default version can be overridden using the following variable:

prometheus_version: "v1.0.0"

Allow the use of prerelease versions (beta, test, development, etc versions), defaults to 'false':

prometheus_use_prerelease: true

Where to store Prometheus's database, defaults to /opt/prometheus/var/prometheus

prometheus_storage_dir: /opt/prometheus/var/prometheus

Prometheus web console templates to utilize. The defaults suffice under most circumstances and this variable should remain unset under most circumstances:

prometheus_web_console_libraries_dir: /opt/prometheus/prometheus/x.x.x/console_libraries
prometheus_web_console_templates_dir: /opt/prometheus/prometheus/x.x.x/consoles

Port and IP to listen on. Defaults to listening on all available IPs on port 9090:

prometheus_host: "0.0.0.0"
prometheus_port: 9090

Example Playbook

Prometheus server

The following example installs Prometheus (server), alertmanager, blackbox_exporter, and the node_exporter. The Prometheus (server) port and storage retention parameters have been changed from the defaults.

The Prometheus server should be installed only on designated Prometheus server hosts. Prometheus clients should only have select and specific exporters installed.

Class use method:

- hosts: prometheus_servers
  vars:
    prometheus_components:
      - prometheus
      - alertmanager
      - blackbox_exporter
      - node_exporter
    prometheus_port: 10000
    prometheus_extra_opts:
     - '--storage.tsdb.retention=90d'
  roles:
    - mesaguy.prometheus

Longer 'include_role' use method:

- hosts: prometheus_servers
  vars:
    prometheus_port: 10000
    prometheus_extra_opts:
     - '--storage.tsdb.retention=90d'
  tasks:
  - name: Prometheus server
    include_role:
      name: mesaguy.prometheus
      tasks_from: '{{ prometheus_component }}'
    loop_control:
      loop_var: prometheus_component
    with_items:
      - prometheus
      - alertmanager
      - blackbox_exporter
      - node_exporter

Additional information

Software installation methods

Installations are performed using pre-compiled binary files where possible. Where pre-compiled binaries are not available, this Ansible role:

  1. Installs the tools necessary to compile the binaries
  2. Compiles the binaries
  3. Installs the binaries in a directory specifying both the version of the Prometheus software and version of go utilized for the installation (ie: /opt/prometheus/exporters/smokeping_exporter_superq/v0.3.1__go-1.14.14/smokeping_prober)

If a binary fails to install or is unavailable despite the existence of some pre-compiled binaries, then the Prometheus module will still be installed using source code.

Security

This module does not manage firewall rules, employ https, or employ authentication to secure the Prometheus software. All of these security measures are worthwhile, but are currently outside the scope of this role.

We closely monitor the release of new Prometheus software as well as the Go compiler and release new versions of this Ansible role accordingly.

All daemons are run via a non-privileged 'prometheus' user by default.

When Prometheus software is installed using source code, the installation destination directory is named for both the Prometheus software version and the Go compiler version. This naming convention ensures that the use of newer versions of the Go compiler force a rebuild of the Prometheus modules build from source. This naming convention also differentiate installations by binary versus installations by compiled source. Forcing Prometheus source rebuilds each time a new version of Go is released has the negative of introducing more work by the clients, but have the benefit of ensuring that and vulnerabilities within the Go core libraries are patched. We assume that Prometheus software that is provided in binary form is monitored for vulnerabilities by the developer and rebuilt as necessary.

Future: When source is compiled, all compile commands are executed using an unprivileged user account. This combined with running daemons as an unprivileged user mitigates many security risks. -- This is currently complicated by limits with become

License

MIT See the LICENSE file

Author Information

Mesaguy