Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add alerts support #218

Merged
merged 6 commits into from
Aug 1, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
101 changes: 101 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ An Ansible role for managing High Availability Clustering.
* Pacemaker node attributes
* Pacemaker Access Control Lists (ACLs)
* node and resource utilization
* Pacemaker Alerts

## Requirements

Expand Down Expand Up @@ -1352,6 +1353,70 @@ ha_cluster_cluster_properties:

You may take a look at [an example](#configuring-acls).

#### `ha_cluster_alerts`

structure, default: no alerts

```yaml
ha_cluster_alerts:
- id: alert1
path: /alert1/path
description: Alert1 description
instance_attrs:
- attrs:
- name: alert_attr1_name
value: alert_attr1_value
meta_attrs:
- attrs:
- name: alert_meta_attr1_name
value: alert_meta_attr1_value
recipients:
- value: recipient_value
id: recipient1
description: Recipient1 description
instance_attrs:
- attrs:
- name: recipient_attr1_name
value: recipient_attr1_value
meta_attrs:
- attrs:
- name: recipient_meta_attr1_name
value: recipient_meta_attr1_value
```

This variable defines Pacemaker alerts.

The items of `alerts` are as follows:

* `id` (mandatory) - ID of an alert.
* `path` (mandatory) - Path to the alert agent executable.
* `description` (optional) - Description of the alert.
* `instance_attrs` (optional) - List of sets of the alert's instance
attributes. Currently, only one set is supported, so the first set is used and
the rest are ignored.
* `meta_attrs` (optional) - List of sets of the alert's meta attributes.
Currently, only one set is supported, so the first set is used and the rest
are ignored.
* `recipients` (optional) - List of alert's recipients.

The items of `recipients` are as follows:

* `value` (mandatory) - Value of a recipient.
* `id` (optional) - ID of the recipient.
* `description` (optional) - Description of the recipient.
* `instance_attrs` (optional) - List of sets of the recipient's instance
attributes. Currently, only one set is supported, so the first set is used and
the rest are ignored.
* `meta_attrs` (optional) - List of sets of the recipient's meta attributes.
Currently, only one set is supported, so the first set is used and the rest
are ignored.

**Note:** The role configures the cluster to call external programs to handle
alerts. It is your responsibility to provide the programs and distribute them to
cluster nodes.

You may take a look at [an example](#configuring-alerts).

#### `ha_cluster_qnetd`

structure and default value:
Expand Down Expand Up @@ -2239,6 +2304,42 @@ Note that you cannot run a quorum device on a cluster node.
- linux-system-roles.ha_cluster
```

### Configuring Alerts

```yaml
- hosts: node1 node2
vars:
ha_cluster_cluster_name: my-new-cluster
ha_cluster_hacluster_password: password
ha_cluster_alerts:
- id: alert1
path: /alert1/path
description: Alert1 description
instance_attrs:
- attrs:
- name: alert_attr1_name
value: alert_attr1_value
meta_attrs:
- attrs:
- name: alert_meta_attr1_name
value: alert_meta_attr1_value
recipients:
- value: recipient_value
id: recipient1
description: Recipient1 description
instance_attrs:
- attrs:
- name: recipient_attr1_name
value: recipient_attr1_value
meta_attrs:
- attrs:
- name: recipient_meta_attr1_name
value: recipient_meta_attr1_value

roles:
- linux-system-roles.ha_cluster
```

### Purging all cluster configuration

```yaml
Expand Down
1 change: 1 addition & 0 deletions defaults/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ ha_cluster_pcs_permission_list:
- write

ha_cluster_acls: {}
ha_cluster_alerts: []
ha_cluster_cluster_properties: []
ha_cluster_node_options: []
ha_cluster_resource_defaults: {}
Expand Down
36 changes: 36 additions & 0 deletions examples/alerts.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# SPDX-License-Identifier: MIT
---
- name: Example ha_cluster role invocation - alerts definition
hosts: node1 node2
vars:
ha_cluster_manage_firewall: true
ha_cluster_manage_selinux: true
ha_cluster_cluster_name: my-new-cluster
ha_cluster_hacluster_password: password
ha_cluster_alerts:
- id: alert1
path: /alert1/path
description: Alert1 description
instance_attrs:
- attrs:
- name: alert_attr1_name
value: alert_attr1_value
meta_attrs:
- attrs:
- name: alert_meta_attr1_name
value: alert_meta_attr1_value
recipients:
- value: recipient_value
id: recipient1
description: Recipient1 description
instance_attrs:
- attrs:
- name: recipient_attr1_name
value: recipient_attr1_value
meta_attrs:
- attrs:
- name: recipient_meta_attr1_name
value: recipient_meta_attr1_value

roles:
- linux-system-roles.ha_cluster
6 changes: 6 additions & 0 deletions tasks/shell_pcs/create-and-push-cib.yml
Original file line number Diff line number Diff line change
Expand Up @@ -247,6 +247,12 @@
vars:
acls: "{{ ha_cluster_acls | d({}) }}"

- name: Configure alerts
include_tasks: pcs-cib-alerts.yml
loop: "{{ ha_cluster_alerts | d([]) }}"
loop_control:
loop_var: alert

# Push the new CIB into the cluster

- name: Create a tempfile for CIB diff
Expand Down
63 changes: 63 additions & 0 deletions tasks/shell_pcs/pcs-cib-alerts.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# SPDX-License-Identifier: MIT
---
- name: Configure alert {{ alert.id }}
command:
cmd: >
pcs -f {{ __ha_cluster_tempfile_cib_xml.path | quote }}
--
alert create
path={{ alert.path | quote }}
id={{ alert.id | quote }}
description={{ alert.description | quote }}
{% if alert.instance_attrs[0].attrs | d([]) %}
options
{% for attr in alert.instance_attrs[0].attrs %}
{{ attr.name | quote }}={{ attr.value | quote }}
{% endfor %}
{% endif %}
{% if alert.meta_attrs[0].attrs | d([]) %}
meta
{% for attr in alert.meta_attrs[0].attrs %}
{{ attr.name | quote }}={{ attr.value | quote }}
{% endfor %}
{% endif %}
# We always need to create CIB to see whether it's the same as what is
# already present in the cluster. However, we don't want to report it as a
# change since the only thing which matters is pushing the resulting CIB to
# the cluster.
check_mode: false
changed_when: not ansible_check_mode

- name: Configure recipients for alert {{ alert.id }}
command:
# Multiple sets of utilization per node are not supported by pcs (and
# therefore the role) as of yet
cmd: >
pcs -f {{ __ha_cluster_tempfile_cib_xml.path | quote }}
--
alert recipient add
{{ alert.id | quote }}
value={{ recipient.value | quote }}
id={{ recipient.id | quote }}
description={{ recipient.description | quote }}
{% if recipient.instance_attrs[0].attrs | d([]) %}
options
{% for attr in recipient.instance_attrs[0].attrs %}
{{ attr.name | quote }}={{ attr.value | quote }}
{% endfor %}
{% endif %}
{% if recipient.meta_attrs[0].attrs | d([]) %}
meta
{% for attr in recipient.meta_attrs[0].attrs %}
{{ attr.name | quote }}={{ attr.value | quote }}
{% endfor %}
{% endif %}
loop: "{{ alert.recipients | d([]) }}"
loop_control:
loop_var: recipient
# We always need to create CIB to see whether it's the same as what is
# already present in the cluster. However, we don't want to report it as a
# change since the only thing which matters is pushing the resulting CIB to
# the cluster.
check_mode: false
changed_when: not ansible_check_mode
83 changes: 83 additions & 0 deletions tests/tests_cib_alerts.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
# SPDX-License-Identifier: MIT
---
- name: Configure alerts
hosts: all
vars_files: vars/main.yml

tasks:
- name: Run test
tags: tests::verify
block:
- name: Set up test environment
include_role:
name: linux-system-roles.ha_cluster
tasks_from: test_setup.yml

- name: Run HA Cluster role
include_role:
name: linux-system-roles.ha_cluster
public: true
vars:
ha_cluster_cluster_name: test-cluster
ha_cluster_manage_firewall: true
ha_cluster_manage_selinux: true
ha_cluster_alerts:
- id: alert1
description: Alert1 description
path: /path/to/somewhere
instance_attrs:
- attrs:
- name: debug
value: "false"
meta_attrs:
- attrs:
- name: timeout
value: 15s
recipients:
- id: recipient1
description: Recipient1 description
value: recipient-value
instance_attrs:
- attrs:
- name: debug
value: "true"
meta_attrs:
- attrs:
- name: timeout
value: 20s
- name: Verify alerts
vars:
__test_expected_lines:
- "Alerts:"
- " Alert: alert1 (path=/path/to/somewhere)"
- " Description: Alert1 description"
- " Options: debug=false"
- " Meta options: timeout=15s"
- " Recipients:"
- " Recipient: recipient1 (value=recipient-value)"
- " Description: Recipient1 description"
- " Options: debug=true"
- " Meta options: timeout=20s"
block:
- name: Fetch alerts configuration from the cluster
command:
cmd: pcs alert
register: _test_pcs_alerts_config
changed_when: false

- name: Print real alerts configuration
debug:
var: _test_pcs_alerts_config

- name: Print expected alerts configuration
debug:
var: __test_expected_lines | list

- name: Check alerts configuration
assert:
that:
- _test_pcs_alerts_config.stdout_lines
== __test_expected_lines | list

- name: Check firewall and selinux state
include_tasks: tasks/check_firewall_selinux.yml
Loading