Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Azure IoT Edge integration #7465

Merged
merged 65 commits into from
Oct 22, 2020
Merged
Show file tree
Hide file tree
Changes from 59 commits
Commits
Show all changes
65 commits
Select commit Hold shift + click to select a range
96fad1a
Add skeleton
Aug 28, 2020
9fef3f0
Working Docker setup for 1.0.9
Aug 31, 2020
ce9bc5c
Attempt 1.0.10-rc2 setup
Aug 31, 2020
5f25b45
Finalize RC2 dev setup
Sep 1, 2020
6fd6fc2
Fix double-endpoint setup, implement scraping of Prometheus endpoints
Sep 1, 2020
1685edd
Update CI config
Sep 2, 2020
a535b5b
Add config class, add failing integration test
Sep 2, 2020
313fc8f
Successfully collect and test metrics, improve env up/down robustness
Sep 2, 2020
28d33a2
Make tests pass
Sep 2, 2020
03c0fbf
Use local mock server for CI tests
Sep 4, 2020
f8629f1
Add Edge Agent metrics
Sep 4, 2020
5d0c6b0
Update codecov config
Sep 7, 2020
b302b8b
Tweak exclude_labels
Sep 7, 2020
72cb919
Fix invalid manifest
Sep 7, 2020
18ca0ec
Add edgeHub metrics
Sep 7, 2020
b2213c6
Document mock server metrics generation
Sep 7, 2020
6423b81
Fix Python 2 tests compatibility
Sep 7, 2020
c94f10b
Assert E2E tags
Sep 7, 2020
1574b3a
Skip E2E tests if IOT_EDGE_CONNSTR is missing
Sep 7, 2020
2832868
Use Windows-compatible mock server setup
Sep 7, 2020
640032a
Add security daemon health service check
Sep 7, 2020
0fd2abe
Simplify prometheus url config, add config tests
Sep 7, 2020
e35340c
Fix style, fix Windows test compat
Sep 7, 2020
47e9d74
Verify service check in e2e
Sep 7, 2020
d9b5234
Fix check class name case
Sep 7, 2020
8f24b1d
Add config spec
Sep 8, 2020
b15cddf
Add logs to config spec and test env
Sep 8, 2020
08508b9
Use auto-discovery for log collection
Sep 8, 2020
cc1550c
Merge branch 'master' into fm/iot_edge
Sep 9, 2020
777a102
Enable log collection via Docker labels
Sep 9, 2020
ce86835
Merge branch 'master' into fm/iot_edge
Sep 10, 2020
f01f898
Set required properties in config spec
Sep 10, 2020
39d3735
Merge branch 'master' into fm/iot_edge
Sep 11, 2020
b7ccdbb
Reorganize config options order
Sep 11, 2020
6b625a7
Loosen wait conditions
Sep 14, 2020
bf95787
Merge branch 'master' into fm/iot_edge
Sep 16, 2020
34a4ffd
Merge branch 'master' into fm/iot_edge
Sep 17, 2020
8bbfddf
Merge branch 'master' into fm/iot_edge
Sep 18, 2020
e43fef6
Merge branch 'master' into fm/iot_edge
Sep 23, 2020
9e268a3
Update namespace to azure.iot_edge
Sep 23, 2020
c64698b
Merge branch 'master' into fm/iot_edge
Sep 25, 2020
3504ee3
Add version metadata collection
Sep 25, 2020
f80e4ec
Update manifest.json
Sep 25, 2020
6041ead
Check types
Sep 25, 2020
37ace98
Write up metadata.csv
Sep 25, 2020
591e5b2
Merge branch 'master' into fm/iot_edge
Sep 28, 2020
846bcac
Fill in service_checks.json
Sep 28, 2020
13d0000
Add TLS support to E2E environment
Sep 28, 2020
4aa5660
Add code comment about single-instance and composition approaches
Sep 28, 2020
c727097
Drop note about setting certs in config.yaml
Sep 28, 2020
60fca78
Merge branch 'master' into fm/iot_edge
Sep 29, 2020
33cccaf
Write up README
Sep 29, 2020
ec868e8
Lingo: security daemon -> security manager
Sep 29, 2020
7b081e4
Add recommended monitors
Sep 29, 2020
1f95257
Merge branch 'master' into fm/iot_edge
Sep 30, 2020
04b990a
Apply no-brainer suggestions
florimondmanca Oct 1, 2020
a1d86cd
Update version metadata transformer
Oct 2, 2020
38e8b19
Address feedback
Oct 2, 2020
dbb1370
Move instance config to Edge Agent labels
Oct 2, 2020
a25a15a
Merge branch 'master' into fm/iot_edge
Oct 9, 2020
0de144a
Merge branch 'master' into fm/iot_edge
Oct 13, 2020
ab0b18f
Apply suggestions from docs review
florimondmanca Oct 13, 2020
042ab4e
Merge branch 'master' into fm/iot_edge
Oct 20, 2020
926c566
Fix type of renotify_interval in monitors json
Oct 20, 2020
145a931
Merge branch 'master' into fm/iot_edge
Oct 22, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .azure-pipelines/changes.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ jobs:
- template: './templates/test-single-windows.yml'
parameters:
job_name: Changed
check: '--changed datadog_checks_base datadog_checks_dev active_directory aspdotnet disk dns_check dotnetclr exchange_server iis pdh_check sqlserver tcp_check win32_event_log windows_service wmi_check'
check: '--changed datadog_checks_base datadog_checks_dev active_directory aspdotnet azure_iot_edge disk dns_check dotnetclr exchange_server iis pdh_check sqlserver tcp_check win32_event_log windows_service wmi_check'
display: Windows
pip_cache_config:
key: 'pip | $(Agent.OS) | datadog_checks_base/datadog_checks/base/data/agent_requirements.in'
Expand Down
6 changes: 6 additions & 0 deletions .azure-pipelines/templates/test-all-checks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,12 @@ jobs:
- checkName: aspdotnet
displayName: ASP.NET
os: windows
- checkName: azure_iot_edge
displayName: Azure IoT Edge
os: linux
- checkName: azure_iot_edge
displayName: Azure IoT Edge
os: windows
- checkName: btrfs
displayName: Btrfs
os: linux
Expand Down
9 changes: 9 additions & 0 deletions .codecov.yml
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,10 @@ coverage:
target: 75
flags:
- apache
Azure IoT Edge:
target: 75
flags:
- azure_iot_edge
Btrfs:
target: 75
flags:
Expand Down Expand Up @@ -565,6 +569,11 @@ flags:
paths:
- aspdotnet/datadog_checks/aspdotnet
- aspdotnet/tests
azure_iot_edge:
carryforward: true
paths:
- azure_iot_edge/datadog_checks/azure_iot_edge
- azure_iot_edge/tests
btrfs:
carryforward: true
paths:
Expand Down
2 changes: 2 additions & 0 deletions azure_iot_edge/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# CHANGELOG - Azure IoT Edge

10 changes: 10 additions & 0 deletions azure_iot_edge/MANIFEST.in
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
graft datadog_checks
graft tests

include MANIFEST.in
include README.md
include requirements.in
include requirements-dev.txt
include manifest.json

global-exclude *.py[cod] __pycache__
114 changes: 114 additions & 0 deletions azure_iot_edge/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
# Agent Check: Azure IoT Edge

## Overview

[Azure IoT Edge][1] is a fully managed service to deploy Cloud workloads to run on Internet of Things (IoT) edge devices via standard containers.
florimondmanca marked this conversation as resolved.
Show resolved Hide resolved

Use the Datadog-Azure IoT Edge integration to collect metrics and health status from IoT Edge devices.

**Note**: this integration requires IoT Edge runtime version 1.0.10 or above.
florimondmanca marked this conversation as resolved.
Show resolved Hide resolved

## Setup

Follow the instructions below to install and configure this check for an IoT Edge device running on a device host.

### Installation

The Azure IoT Edge check is included in the [Datadog Agent][2] package.

No additional installation is needed on your device.

### Configuration

It is recommended to configure the IoT Edge device so that the Agent runs as a custom module. Follow the official Microsoft documentation on [deploying Azure IoT Edge modules][3] for general information on installing and working with custom modules for Azure IoT Edge.
florimondmanca marked this conversation as resolved.
Show resolved Hide resolved

Follow the steps below to configure the IoT Edge device, runtime modules, and the Datadog Agent to start collecting IoT Edge metrics.

1. Configure the **Edge Agent** runtime module as follows:
- Image version must be `1.0.10` or above.
- Under "Create Options", add the following `Labels`. Edit the `com.datadoghq.ad.instances` label as appropriate. See the [sample azure_iot_edge.d/conf.yaml][5] for all available configuration options. See the documentation on [Docker Integrations Autodiscovery][6] for more information on labels-based integration configuration.

```json
"Labels": {
"com.datadoghq.ad.check_names": "[\"azure_iot_edge\"]",
"com.datadoghq.ad.init_configs": "[{}]",
"com.datadoghq.ad.instances": "[{\"edge_hub_prometheus_url\": \"http://edgeHub:9600/metrics\", \"edge_agent_prometheus_url\": \"http://edgeAgent:9600/metrics\"}]"
}
```

- Under "Environment Variables", experimental metrics must be enabled by adding these environment variables (note the double underscores):
florimondmanca marked this conversation as resolved.
Show resolved Hide resolved
- `ExperimentalFeatures__Enabled`: `true`
- `ExperimentalFeatures__EnableMetrics`: `true`

1. Configure the **Edge Hub** runtime module as follows:
- Image version must be `1.0.10` or above.
- Under "Environment Variables", experimental metrics must be enabled by adding these environment variables (note the double underscores):
florimondmanca marked this conversation as resolved.
Show resolved Hide resolved
- `ExperimentalFeatures__Enabled`: `true`
- `ExperimentalFeatures__EnableMetrics`: `true`

1. Install and configure the Datadog Agent as a **custom module**:
- Set the module name. For example: `datadog-agent`.
- Set the Agent image URI. For example: `datadog/agent:7`.
- Under "Environment Variables", configure your `DD_API_KEY`. You may also set extra Agent configuration here (see [Agent Environment Variables][4]).
- Under "Container Create Options", enter the following configuration based on your device OS. **Note**: `NetworkId` must correspond to the network name set in the device `config.yaml` file.

- Linux:
```json
{
"HostConfig": {
"NetworkMode": "default",
florimondmanca marked this conversation as resolved.
Show resolved Hide resolved
"Env": ["NetworkId=azure-iot-edge"],
"Binds": ["/var/run/docker.sock:/var/run/docker.sock"]
}
}
```
- Windows:
```json
{
"HostConfig": {
"NetworkMode": "default",
"Env": ["NetworkId=nat"],
"Binds": ["//./pipe/iotedge_moby_engine:/./pipe/docker_engine"]
}
}
```

- Save the Datadog Agent custom module.

1. Save and deploy changes to your device configuration.

### Validation

Once the Agent has been deployed to the device, [run the Agent's status subcommand][7] and look for `azure_iot_edge` under the Checks section.

## Data Collected

### Metrics

See [metadata.csv][8] for a list of metrics provided by this check.

### Service Checks

**azure.iot_edge.edge_agent.prometheus.health**:
Returns `CRITICAL` if the Agent is unable to reach the Edge Agent metrics Prometheus endpoint. Returns `OK` otherwise.

**azure.iot_edge.edge_hub.prometheus.health**:
Returns `CRITICAL` if the Agent is unable to reach the Edge Hub metrics Prometheus endpoint. Returns `OK` otherwise.

### Events

Azure IoT Edge does not include any events.

## Troubleshooting

Need help? Contact [Datadog support][9].

[1]: https://azure.microsoft.com/en-us/services/iot-edge/
[2]: https://docs.datadoghq.com/agent/
[3]: https://docs.microsoft.com/en-us/azure/iot-edge/how-to-deploy-modules-portal
[4]: https://docs.datadoghq.com/agent/guide/environment-variables/
[5]: https://github.com/DataDog/integrations-core/blob/master/azure_iot_edge/datadog_checks/azure_iot_edge/data/conf.yaml.example
[6]: https://docs.datadoghq.com/agent/docker/integrations/
[7]: https://docs.datadoghq.com/agent/guide/agent-commands/#agent-status-and-information
[8]: https://github.com/DataDog/integrations-core/blob/master/azure_iot_edge/metadata.csv
[9]: https://docs.datadoghq.com/help/
24 changes: 24 additions & 0 deletions azure_iot_edge/assets/configuration/spec.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
name: Azure IoT Edge
files:
- name: azure_iot_edge.yaml
options:
- template: init_config
options:
- template: init_config/default
- template: instances
options:
- name: edge_hub_prometheus_url
description: |
The URL where Edge Hub metrics are exposed via Prometheus.
required: true
value:
type: string
example: http://edgeHub:9600/metrics
- name: edge_agent_prometheus_url
description: |
The URL where Edge Agent metrics are exposed via Prometheus.
required: true
value:
type: string
example: http://edgeAgent:9600/metrics
- template: instances/default
Empty file.
31 changes: 31 additions & 0 deletions azure_iot_edge/assets/monitors/disk_usage.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
{
"name": "[Azure IoT Edge] IoT Edge device {{host}} is running out of available disk space",
"type": "query alert",
"query": "max(last_1h):avg:azure.iot_edge.edge_agent.available_disk_space_bytes{*} by {host} / avg:azure.iot_edge.edge_agent.total_disk_space_bytes{*} by {host}.rollup(max, 60) * 100 < 10",
"message": "Please check device {{host}}, as Edge Agent reports that available disk space has dropped below {{threshold}}%.",
"tags": [
"integration:azure_iot_edge"
],
"options": {
"notify_audit": false,
"locked": false,
"timeout_h": 0,
"silenced": {},
"include_tags": true,
"no_data_timeframe": null,
"require_full_window": true,
"new_host_delay": 300,
"notify_no_data": false,
"renotify_interval": 0,
"escalation_message": "",
"thresholds": {
"critical": 10,
"warning": 25,
"critical_recovery": 11,
"warning_recovery": 26
}
},
"recommended_monitor_metadata": {
"description": "Triggers an alert when an IoT Edge device is running out of available disk space"
}
}
32 changes: 32 additions & 0 deletions azure_iot_edge/assets/monitors/edgehub_retries.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
{
"name": "[Azure IoT Edge] Rate of Edge Hub operations retries is higher than usual on device device {{host}}",
"type": "query alert",
"query": "avg(last_1h):anomalies(per_minute(avg:azure.iot_edge.edge_hub.operation_retry_total{*} by {host}), 'basic', 2, direction='above', alert_window='last_15m', interval=60, count_default_zero='true') >= 1",
"message": "Please check device {{host}}, as Edge Hub reports a rate of operation retries of {{value}} per minute, which is higher than usual.",
"tags": [
"integration:azure_iot_edge"
],
"options": {
"notify_audit": false,
"locked": false,
"timeout_h": 0,
"new_host_delay": 300,
"require_full_window": false,
"notify_no_data": false,
"renotify_interval": "0",
florimondmanca marked this conversation as resolved.
Show resolved Hide resolved
"escalation_message": "",
"no_data_timeframe": null,
"include_tags": true,
"thresholds": {
"critical": 1,
"critical_recovery": 0
},
"threshold_windows": {
"trigger_window": "last_15m",
"recovery_window": "last_15m"
}
},
"recommended_monitor_metadata": {
"description": "Notifies when rate of Edge Hub operation retries is higher than usual"
}
}
32 changes: 32 additions & 0 deletions azure_iot_edge/assets/monitors/iothub_syncs.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
{
"name": "[Azure IoT Edge] Rate of unsuccessful syncs with IoT Hub is higher than usual on device {{host}}",
"type": "query alert",
"query": "avg(last_1h):anomalies(per_minute(avg:azure.iot_edge.edge_agent.unsuccessful_iothub_syncs_total{*} by {host}), 'basic', 2, direction='above', alert_window='last_15m', interval=60, count_default_zero='true') >= 1",
"message": "Number of unsuccessful syncs between Edge Agent and IoT Hub on device {{host}} is at {{value}} per minute, which is higher than usual.",
"tags": [
"integration:azure_iot_edge"
],
"options": {
"notify_audit": false,
"locked": false,
"timeout_h": 0,
"new_host_delay": 300,
"require_full_window": false,
"notify_no_data": false,
"renotify_interval": "0",
florimondmanca marked this conversation as resolved.
Show resolved Hide resolved
"escalation_message": "",
"no_data_timeframe": null,
"include_tags": true,
"thresholds": {
"critical": 1,
"critical_recovery": 0
},
"threshold_windows": {
"trigger_window": "last_15m",
"recovery_window": "last_15m"
}
},
"recommended_monitor_metadata": {
"description": "Notifies when unsuccessful syncs between Edge Agent and IoT Hub are higher than usual"
}
}
31 changes: 31 additions & 0 deletions azure_iot_edge/assets/monitors/memory_usage.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
{
"name": "[Azure IoT Edge] IoT Edge device {{host}} is running out of memory",
"type": "query alert",
"query": "max(last_1h):avg:azure.iot_edge.edge_agent.used_memory_bytes{*} by {host} / avg:azure.iot_edge.edge_agent.total_memory_bytes{*} by {host}.rollup(max, 60) * 100 > 80",
"message": "Please check device {{host}}, as Edge Agent reports usage of more than {{threshold}}% of available RAM for the last hour.",
"tags": [
"integration:azure_iot_edge"
],
"options": {
"notify_audit": false,
"locked": false,
"timeout_h": 0,
"silenced": {},
"include_tags": true,
"no_data_timeframe": null,
"require_full_window": true,
"new_host_delay": 300,
"notify_no_data": false,
"renotify_interval": 0,
"escalation_message": "",
"thresholds": {
"critical": 80,
"warning": 65,
"critical_recovery": 79,
"warning_recovery": 64
}
},
"recommended_monitor_metadata": {
"description": "Triggers an alert when an IoT Edge device is running out of memory"
}
}
32 changes: 32 additions & 0 deletions azure_iot_edge/assets/service_checks.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
[
{
"agent_version": "6.24.0",
"integration": "Azure IoT Edge",
"groups": [
"host",
"endpoint"
],
"check": "azure.iot_edge.edge_agent.prometheus.health",
"statuses": [
"ok",
"critical"
],
"name": "Edge Agent health",
"description": "Returns `CRITICAL` if the Agent is unable to reach the Edge Agent metrics Prometheus endpoint. Returns `OK` otherwise."
},
{
"agent_version": "6.24.0",
"integration": "Azure IoT Edge",
"groups": [
"host",
"endpoint"
],
"check": "azure.iot_edge.edge_hub.prometheus.health",
"statuses": [
"ok",
"critical"
],
"name": "Edge Hub health",
"description": "Returns `CRITICAL` if the Agent is unable to reach the Edge Hub metrics Prometheus endpoint. Returns `OK` otherwise."
}
]
4 changes: 4 additions & 0 deletions azure_iot_edge/datadog_checks/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# (C) Datadog, Inc. 2020-present
# All rights reserved
# Licensed under a 3-clause BSD style license (see LICENSE)
__path__ = __import__('pkgutil').extend_path(__path__, __name__) # type: ignore
4 changes: 4 additions & 0 deletions azure_iot_edge/datadog_checks/azure_iot_edge/__about__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# (C) Datadog, Inc. 2020-present
# All rights reserved
# Licensed under a 3-clause BSD style license (see LICENSE)
__version__ = '0.0.1'
7 changes: 7 additions & 0 deletions azure_iot_edge/datadog_checks/azure_iot_edge/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# (C) Datadog, Inc. 2020-present
# All rights reserved
# Licensed under a 3-clause BSD style license (see LICENSE)
from .__about__ import __version__
from .check import AzureIoTEdgeCheck

__all__ = ['__version__', 'AzureIoTEdgeCheck']
Loading