cloudwatch_metrics_agent is a simple agent for sending custom CPU and memory metrics to Cloudwatch.
cargo install cloudwatch_metrics_agent
To launch agent and send metrics to CloudWatch each minute:
cloudwatch_metrics_agent --namespace TestNamespace --service FooService --period 60
To preview metrics without actual sending them to CloudWatch:
RUST_LOG=info cloudwatch_metrics_agent --namespace TestNamespace --service FooService --period 60 --dry-run
To deploy agent in ECS with Fargate or EC2 just add a container with the agent to a task definition with a monitored service. Agent's container shares resources and namespace with other containers in a task definition so collected metrics are valid.
If compute environment does not allow multi-container configurations (such as AWS Batch) it is possible to launch agent in background while keeping main process in foreground. To properly handle termination a supervisor or a init multiplexer like tini or dumb-init must be used so the backround agent process receive SIGTERM
signal and flushes remaining metrics at shutdown. Because entrypoint shell script is used as a entrypoint, the whole process group must receive a signal. To achieve this the tini
must be used a with a process group -g
flag while the dumb-init
does it by default.
The entrypoint script must also send metrics on succesfull foreground process termination when no signals sent.
Example entrypoint script:
#!/bin/bash
# launch agent in the background and save its pid
cloudwatch_metrics_agent --namespace TestNamespace --service FooService &
child=$!
# launch main process in the foreground
"$@"
exitcode=$?
echo "Terminating agent after foreground process exit" >&2
kill -TERM "$child"
wait "$child"
echo "Exiting with $exitcode" >&2
exit $exitcode
Example ENTRYPOINT
directive for Dockerfile
:
ENTRYPOINT ["tini", "-g", "--"]
or
ENTRYPOINT ["dumb-init"]
The agent collects system metrics (CPU and memory utilization) and periodically sends them to a publisher. Publisher can be a AWS CloudWatch service or console (for debugging).
Period (--period
parameter) specifies how often logs are emitted to publisher, one minute by default. Metrics are collected during this period with much higher resolution (0.9 seconds) and aggregated by median and max values. So it relatively safe to set a publishing period to five minutes (non-detailed CloudWatch metrics default).
If the cloud service is unavailable while sending a bunch of metrics after two retries, this bunch is skipped. When the agent is stopped (SIGTERM or SIGINT signal received), all remaining metrics are flushed to the publisher.
Emitted metrics:
-
CPUUtilization
- median CPU utilization across all CPU cores, in percents. -
MemoryUtilization
- median memory utilization, in percents. Calculated as used memory divided by total memory in percents where used memory is total memory without free, buffers, page cache and slabs.
Metrics are published to the specified namespace and service name. AWS credentials are configured via the environment variables, config files or IAM role.
Logs are configured via env_logger so the RUST_LOG
environment variable controls logging
with errors only by default. To show only emitted metrics while omitting other logs set the environment variable RUST_LOG="cloudwatch_metrics_agent::publisher=info"
or RUST_LOG="cloudwatch_metrics_agent::cloudwatch=info"
.
The agent uses Tokio framework for asynchronous work, AWS SDK for Rust for communicating with AWS and sysinfo
for gathering metrics.
Works on Linux and macOS.
To build a static binary for usage in container use a static CRT linkage:
cargo build --release --target x86_64-unknown-linux-musl
BSD 3-Clause