-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Azure IoT Edge integration #7465
Conversation
Codecov Report
|
314da3d
to
5e943fb
Compare
fd55c40
to
3e08ab7
Compare
3268667
to
07157b6
Compare
07157b6
to
86bc160
Compare
86bc160
to
f8629f1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job with the testing 👍
Co-authored-by: Florian Veaux <florian.veaux@datadoghq.com>
Drop security manager service check Reorganize check as an OpenMetricsBaseCheck subclass Fix E2E tests Update docs Fix service checks: can_connect -> prometheus.health
5410be0
to
38e8b19
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job 💯 🚢
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some docs style edit comments. Thank you!
Co-authored-by: Kari Halsted <12926135+kayayarai@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comments about renotify_interval
which needs to be an int not string. Thanks
* Add skeleton * Working Docker setup for 1.0.9 * Attempt 1.0.10-rc2 setup * Finalize RC2 dev setup * Fix double-endpoint setup, implement scraping of Prometheus endpoints * Update CI config * Add config class, add failing integration test * Successfully collect and test metrics, improve env up/down robustness * Make tests pass * Use local mock server for CI tests * Add Edge Agent metrics * Update codecov config * Tweak exclude_labels * Fix invalid manifest * Add edgeHub metrics * Document mock server metrics generation * Fix Python 2 tests compatibility * Assert E2E tags * Skip E2E tests if IOT_EDGE_CONNSTR is missing * Use Windows-compatible mock server setup * Add security daemon health service check * Simplify prometheus url config, add config tests * Fix style, fix Windows test compat * Verify service check in e2e * Fix check class name case * Add config spec * Add logs to config spec and test env * Use auto-discovery for log collection * Enable log collection via Docker labels * Set required properties in config spec * Reorganize config options order * Loosen wait conditions * Update namespace to azure.iot_edge * Add version metadata collection * Update manifest.json * Check types * Write up metadata.csv * Fill in service_checks.json * Add TLS support to E2E environment * Add code comment about single-instance and composition approaches * Drop note about setting certs in config.yaml This is already done automatically by the E2E environment * Write up README * Lingo: security daemon -> security manager * Add recommended monitors * Apply no-brainer suggestions Co-authored-by: Florian Veaux <florian.veaux@datadoghq.com> * Update version metadata transformer * Address feedback Drop security manager service check Reorganize check as an OpenMetricsBaseCheck subclass Fix E2E tests Update docs Fix service checks: can_connect -> prometheus.health * Move instance config to Edge Agent labels * Apply suggestions from docs review Co-authored-by: Kari Halsted <12926135+kayayarai@users.noreply.github.com> * Fix type of renotify_interval in monitors json Co-authored-by: Florian Veaux <florian.veaux@datadoghq.com> Co-authored-by: Kari Halsted <12926135+kayayarai@users.noreply.github.com> de79a68
What does this PR do?
Add integration for monitoring Azure IoT Edge.
Motivation
Additional Notes
manifest.json
metadata.csv
service_checks.json
-tls
E2E env and instructions below.docker
log integration)Misc notes:
QA Notes
Basic verifications
tests/README.md
. You'll need to connect toportal.azure.com
and retrieve the device connection string. If you're unsure how to connect there, hint: I wrote up an Azure IoT Edge testing environment guide (can't link to it here, but it's in our Wiki) that contains that info. If you're still unsure, ping me!docker exec -it <container> agent status
).ddev env check
), make sure it reports sensible metrics, and that all service checks are OK.CRITICAL
, and only the associated metrics go missing.Linux VM testing
Motivation: make sure the integration works well against a standard host-based IoT Edge security manager. (The one we use in E2E is Docker-based, but that's not how users will typically run their security manager.)
datadog-agent
container).TLS-enabled devices
Motivation: make sure the integration works well when the IoT Edge security manager uses custom certs (aka acts as a "transparent gateway"). Note: by default, the security manager will generate throw-away certs and let IoT Hub know, so in practice TLS is always used. I just went through this setup to make sure there really wasn't anything else we needed to do on the integration side, so here are the steps if you want to verify that yourself.
tests/tls/README.md
to setup test certificates for yourself. This will require you to run a script, upload a root CA cert to IoT Hub, generate and a verification code, create a verification cert, and finally upload this verification cert again to IoT Hub. Make sure you don't change any of the generated filenames, as the-tls
E2E environments depend on them.py27-tls
orpy38-tls
. Make sure to inspectiot-edge-device
logs to verify that certs were correctly taken into account (in particular, the manager shouldn't start in "quickstart mode"):Review checklist (to be filled by reviewers)
changelog/
andintegration/
labels attached