Skip to content
This repository was archived by the owner on May 7, 2024. It is now read-only.
/ candlestack Public archive

A monitoring service that shines some light on your AWS stack

License

Notifications You must be signed in to change notification settings

CodeArcsInc/candlestack

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

56 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Overview

Candlestack is an open source tool for monitoring dynamic infrastructure deployed via AWS. It is capable of automatically detecting infrastructure changes while utilizing a combination of open source tools to collect, visualize, monitor, and alert on various metrics.

How does it work?

At the heart of the system sits the Candlestack server which runs the Candlestack Java application and Nagios core. The Candlestack Java application performs two main roles:

  1. Monitor the supported AWS services for any changes that requires us to either start or stop monitoring pieces of infrastructure. In the event something has changed it will update the Nagios configuration as applicable.
  2. Collect certain metric data from the AWS CloudWatch service and/or AWS service API and feed it into Elasticsearch via Filebeat and Logstash so that it can be used for monitoring and visualization.

Meanwhile the Nagios core provides the support for monitoring and alerting via a well known industry tool. To send alert emails it utilizes AWS SES but it could also easily utilize SMTP or any other means supported by Nagios core.

Looking beyond the Candlestack server we have an Elasticsearch server or cluster that aggregates and stores the various metrics being collected. These metrics can then be accessed via Kibana to visualize things such as CPU utilization over time. It is also accessed by Nagios as part of its monitoring routine.

Beyond that we have the applications themselves which in some cases will be running tools such as MetricBeat to collect other metric data about the EC2 instance resources such as CPU, disk, and memory.

AWS Support

AWS offers a large number of services, some of which do not make sense to monitor from a systems perspective. That being said Candlestack currently supports the monitoring of the following AWS services with more services to be supported down the road.

EC2

EC2 is the back bone for most dynamic infrastructure in the AWS ecosystem since it provides a variety of on demand hardware for running various applications. Currently Candlestack is able to monitor CPU utilization, network in, and network out for any EC2 instance out of the box thanks to CloudWatch. It is also possible to monitor disk utilization and memory utilization if other services are installed on the EC2 instance such as the munin-node agent.

Elastic Beanstalk

Elastic Beanstalk provides an easy way to deploy applications to AWS EC2 instances along with load balancing and system scaling. Currently Candlestack is able to perform monitoring of the Elastic Beanstalk environment health along with the EC2 instance monitoring mentioned above.

SQS

SQS provides a message queueing service that is often used in place of other JMS providers such as ActiveMQ when deploying infrastructure to AWS. Currently Candlestack is able to monitor on a per queue basis the approximate number of messages, approximate age of oldest message, number of messages received, number of messages sent, and last modified.

RDS

RDS provides a variety of relational database technologies that can be easily spun up with out you needing to know the intricacies of that databases hardware setup. Since each of the database technologies in RDS have different CloudWatch metrics Candlestack currently only supports MariaDB and AuroraDB databases. For a MariaDB database it is able to monitor CPU utilization, number of database connections, and free storage space. As for an AuroraDB database it is able to monitor CPU utilization, number of database connections, volume bytes used, replica lag, and number of active transactions.

S3

S3 provides virtually unlimited storage potential for any files that need to be persisted or accessed via different systems. Since a common use of S3 is to store backups of databases or servers Candlestack currently supports monitoring specific S3 files last modified for staleness.

Installation

To help with the installation of the Candlestack server we have provided an example Docker image file and Elastic Beanstalk template. The Docker image file example located here includes all of the dependencies needed by the Candlestack server to properly function, which includes:

  • Nagios Core & Plugins
    • Nagios handles the monitoring and alerting logic
  • Apache2 Web Server
    • Web server for serving the Nagios web pages
  • Java 8
    • The Candlestack java application was built using Java 8 and thus requires a Java 8 JVM to run
  • Filebeat
    • Sends the Candlestack logs and metric data to Elasticsearch
  • Metricbeat
    • Collects information about the server instance Candlestack is running on so that Candlestack can monitor itself to a degree, also acts as an example of how Metricbeat can be setup on other applications.
  • AWS Command Line Interface (CLI)
  • Supervisor
    • Used to keep the Filebeat and Candlestack Java application running

The Elastic Beanstalk template example located here demonstrates the various files that need to be provided and how to start the various components. Throughout the files you will come across variables that are prefixed with $TODO_, which represent places in the script you should replace the variable with a value applicable for your system. You can find a complete list of these variables below along with an explanation of each one.

Variable File(s) Description
$TODO_DOCKER_IMAGE_LOCATION Dockerfile The URL to retrieve the base Docker image created by the Docker image file mentioned above
$TODO_NAGIOS_ADMIN Dockerfile The desired Nagios admin username
$TODO_NAGIOS_PASSWORD Dockerfile The desired Nagios admin password
$TODO_LOGSTASH_HOST filebeat.yml and 00-munin.config The URL of the Logstash server to be used by Candlestack for sending logs or metric data (Note: you may need to alter the ports depending on your Logstash configuration)
$TODO_FROM_EMAIL_ADDRESS notify-host-by-email.sh and notify-service-by-email.sh The from email address Nagios should use when sending notification emails
$TODO_REPOSITORY pom.xml The maven repository URL to be used when building the deployment
$TODO_SSL_CERTIFICATE elb.config The SSL certificate ARN to be used by Candlestack server

Configuration

Candlestack has been built to be highly configurable since everyone's infrastructure is different and thus so are their monitoring requirements.

Candlestack Java Application

Outlined below you will find all of the configuration properties for the Java application explained and a sample configuration file has been provided here to get you started.

Property Required Default Description
metrics.writer.dir true N/A The directory where metrics data collected by Candlestack should be written so that it can be picked up by filebeat. Metric files will always have the filename format candlestack_metrics_yyyy-MM-ddTHH:mm:ss.SSSZ.log and contain lines of JSON with the metric data.
scripts.dir false /opt/candlestack/scripts/ The directory where the script files for Nagios checks are located. Please see a table below that outlines the various script files that are expected.
nagios.updater.sleep.interval.min false 10 The amount of time in minutes Candlestack should wait between checking the various AWS infrastructure for changes that require alteration of the Nagios configuration files.
nagios.updater.restart.cmd true N/A The system command Candlestack should execute to restart the Nagios process for it to pick up changes to the Nagios configuration files. In most cases this is going to be something like /etc/init.d/nagios restart
nagios.object.definition.dir false /var/tmp/nagios/objects/ The directory where Candlestack should output the Nagios configuration files it generates based off the detected AWS infrastructure. Be sure to have Nagios configured to look at this directory as well.
nagios.object.definition.user. timeperiods false N/A An optional setting that allows a user to define custom time periods that can be used alongside the standard ones provided by Candlestack. If provided the value must point to a valid Nagios object definition file, otherwise Nagios will fail to start.
nagios.object.definition.user. checks false N/A An option setting that allows a user to define custom Nagios checks. This is useful for instances when you want Nagios to monitor things that don't fall under Candlestack's radar. If provided the value must point to a valid Nagios object definition file, otherwise Nagios will fail to start.
nagios.command.commandname false N/A In some instances you may want to define static Nagios command objects. Simply replace the commandname in this variable with a meaningful name and provide the command line to be used as the value. A good example of when this would be used is to define the host and service notification commands.
nagios.contact.default. host.notifications.enabled true N/A Defines a default value for the host_notifications_enabled directive for any contact definitions created by Candlestack for Nagios.
nagios.contact.default. host.notification.options true N/A Defines a default value for the host_notification_options directive for any contact definitions created by Candlestack for Nagios.
nagios.contact.default. host.notification.commands true N/A Defines a default value for the host_notification_commands directive for any contact definitions created by Candlestack for Nagios.
nagios.contact.default. host.notification.period true N/A Defines a default value for the host_notification_period directive for any contact definitions created by Candlestack for Nagios. Currently only the value 24x7 is supported.
nagios.contact.default. service.notifications.enabled true N/A Defines a default value for the service_notifications_enabled directive for any contact definitions created by Candlestack for Nagios.
nagios.contact.default. service.notification.options true N/A Defines a default value for the service_notification_options directive for any contact definitions created by Candlestack for Nagios.
nagios.contact.default. service.notification.commands true N/A Defines a default value for the service_notification_commands directive for any contact definitions created by Candlestack for Nagios.
nagios.contact.default. service.notification.period true N/A Defines a default value for the service_notification_period directive for any contact definitions created by Candlestack for Nagios. Currently only the value 24x7 is supported.
nagios.contact.contactname.alias false "" Defines the value for the alias directive for this specific contact definition. Simply replace the contactname in this variable with the value of the contact_name directive that should be used when defining the contact.
nagios.contact.contactname.email true N/A Defines the value for the email directive for this specific contact definition. Simply replace the contactname in this variable with the value of the contact_name directive that should be used when defining the contact.
nagios.contact.contactname. host.notifications.enabled false nagios.contact. default.host. notifications.enabled Defines the value for the host_notifications_enabled directive for this specific contact definition that should be used instead of the default. Simply replace the contactname in this variable with the value of the contact_name directive that should be used when defining the contact.
nagios.contact.contactname. host.notification.options false nagios.contact. default.host. notification.options Defines the value for the host_notification_options directive for this specific contact definition that should be used instead of the default. Simply replace the contactname in this variable with the value of the contact_name directive that should be used when defining the contact.
nagios.contact.contactname. host.notification.commands false nagios.contact. default.host. notification.commands Defines the value for the host_notification_commands directive for this specific contact definition that should be used instead of the default. Simply replace the contactname in this variable with the value of the contact_name directive that should be used when defining the contact.
nagios.contact.contactname. host.notification.period false nagios.contact.default. host.notification.period Defines the value for the host_notification_period directive for this specific contact definition that should be used instead of the default. Simply replace the contactname in this variable with the value of the contact_name directive that should be used when defining the contact. Currently only the value 24x7 is supported.
nagios.contact.contactname. service.notifications.enabled false nagios.contact. default.service. notifications.enabled Defines the value for the service_notifications_enabled directive for this specific contact definition that should be used instead of the default. Simply replace the contactname in this variable with the value of the contact_name directive that should be used when defining the contact.
nagios.contact.contactname. service.notification.options false nagios.contact. default.service. notification.options Defines the value for the service_notification_options directive for this specific contact definition that should be used instead of the default. Simply replace the contactname in this variable with the value of the contact_name directive that should be used when defining the contact.
nagios.contact.contactname. service.notification.commands false nagios.contact. default.service. notification.commands Defines the value for the service_notification_commands directive for this specific contact definition that should be used instead of the default. Simply replace the contactname in this variable with the value of the contact_name directive that should be used when defining the contact.
nagios.contact.contactname. service.notification.period false nagios.contact. default.service. notification.period Defines the value for the service_notification_period directive for this specific contact definition that should be used instead of the default. Simply replace the contactname in this variable with the value of the contact_name directive that should be used when defining the contact. Currently only the value 24x7 is supported.
nagios.contactgroup. contactgroupname.alias false N/A Defines the value for the alias directive for this specific contact group definition. Simply replace the contactgroupname in this variable with the value of the contactgroup_name directive that should be used when defining the contact group.
nagios.contactgroup. contactgroupname.members false N/A Defines the value for the alias directive for this specific contact group definition. Simply replace the contactgroupname in this variable with the value of the contactgroup_name directive that should be used when defining the contact group.
aws.region true N/A The AWS region Candlestack should monitor.
aws.logs.host true N/A The host to use when accessing Elasticsearch for metric data. This value will be used by the Nagios check scripts and is what will be provided as the host property for those scripts (see below for check script properties).
aws.logs.authtoken true N/A The authtoken to use when accessing Elasticsearch for metric data. This value will be used by the Nagios check scripts and is what will be provided as the authtoken property for those scripts (see below for check script properties).
aws.cloudwatch.detailed. monitoring.enabled false false This flags tells Candlestack whether or not your infrastructure is utilizing detailed CloudWatch monitoring. This allows Candlestack to more accurately target the CloudWatch request period since detailed monitoring results in a data point each minute as opposed to every 5 minutes.
aws.ec2.enabled false false This flag tells Candlestack whether or not it should monitor your EC2 infrastructure.
aws.ec2.name.prefix false "" Currently Candlestack identifies the EC2 instances to monitor via the Name tag associated to the instance. If this property is provided an EC2 instance Name must start with the provided prefix string to be monitored. It is also important to note that EC2 instances created by Elastic Beanstalk will be ignored, to monitor those enable Elastic Beanstalk monitoring.
aws.ec2.name.regex false "" Currently Candlestack identifies the EC2 instances to monitor via the Name tag associated to the instance. If this property is provided an EC2 instance Name must match the provided regex pattern to be monitored. It is also important to note that EC2 instances created by Elastic Beanstalk will be ignored, to monitor those enable Elastic Beanstalk monitoring.
aws.ec2.metrics.fetcher.sleep.min false 5 The amount of time Candlestack should wait between fetching new metric data for the EC2 instances being monitored. The default value should only be lowered if you have detailed monitoring enabled otherwise you will incur unnecessary CloudWatch API requests.
aws.ec2.metricbeat.metrics.monitor false "" If you have installed the Metricbeat agent on your EC2 instance then you are able to enable the following metrics to be monitored by Nagios for each EC2 instance. Possible values are CPUUtilization, DiskUtilization, FreeMemory, NetworkIn, and NetworkOut, simply provide a comma separated list of these values if you want multiple enabled.
aws.ec2.metricbeat.metric. warning.default.CPUUtilization true N/A If you have enabled the CPUUtilization metric you must provide a default warning alert value to be used by the corresponding Nagios check script. This value will be passed to the Nagios check script via the warning property (see below for check script properties).
aws.ec2.metricbeat.metric. critical.default.CPUUtilization true N/A If you have enabled the CPUUtilization metric you must provide a default critical alert value to be used by the corresponding Nagios check script. This value will be passed to the Nagios check script via the critical property (see below for check script properties).
aws.ec2.metricbeat.metric. warning.default.DiskUtilization true N/A If you have enabled the DiskUtilization metric you must provide a default warning alert value to be used by the corresponding Nagios check script. This value will be passed to the Nagios check script via the warning property (see below for check script properties).
aws.ec2.metricbeat.metric. critical.default.DiskUtilization true N/A If you have enabled the DiskUtilization metric you must provide a default critical alert value to be used by the corresponding Nagios check script. This value will be passed to the Nagios check script via the critical property (see below for check script properties).
aws.ec2.metricbeat.metric. warning.default.FreeMemory true N/A If you have enabled the FreeMemory metric you must provide a default warning alert value to be used by the corresponding Nagios check script. This value will be passed to the Nagios check script via the warning property (see below for check script properties).
aws.ec2.metricbeat.metric. critical.default.FreeMemory true N/A If you have enabled the FreeMemory metric you must provide a default critical alert value to be used by the corresponding Nagios check script. This value will be passed to the Nagios check script via the critical property (see below for check script properties).
aws.ec2.metricbeat.metric. warning.default.NetworkIn true N/A If you have enabled the NetworkIn metric you must provide a default warning alert value to be used by the corresponding Nagios check script. This value will be passed to the Nagios check script via the warning property (see below for check script properties).
aws.ec2.metricbeat.metric. critical.default.NetworkIn true N/A If you have enabled the NetworkIn metric you must provide a default critical alert value to be used by the corresponding Nagios check script. This value will be passed to the Nagios check script via the critical property (see below for check script properties).
aws.ec2.metricbeat.metric. warning.default.NetworkOut true N/A If you have enabled the NetworkOut metric you must provide a default warning alert value to be used by the corresponding Nagios check script. This value will be passed to the Nagios check script via the warning property (see below for check script properties).
aws.ec2.metricbeat.metric. critical.default.NetworkOut true N/A If you have enabled the NetworkOut metric you must provide a default critical alert value to be used by the corresponding Nagios check script. This value will be passed to the Nagios check script via the critical property (see below for check script properties).
aws.ec2.metricbeat.metric. warning.instanceid.CPUUtilization false aws.ec2.metricbeat. metric.warning.default. CPUUtilization If you have enabled the CPUUtilization metric you can override the default warning alert value to be used by the corresponding Nagios check script for the specified EC2 instanceid. This value will be passed to the Nagios check script via the warning property (see below for check script properties).
aws.ec2.metricbeat.metric. critical.instanceid.CPUUtilization false aws.ec2.metricbeat. metric.critical.default. CPUUtilization If you have enabled the CPUUtilization metric you can override the default critical alert value to be used by the corresponding Nagios check script for the specified EC2 instanceid. This value will be passed to the Nagios check script via the critical property (see below for check script properties).
aws.ec2.metricbeat.metric. warning.instanceid.DiskUtilization false aws.ec2.metricbeat. metric.warning.default. DiskUtilization If you have enabled the DiskUtilization metric you can override the default warning alert value to be used by the corresponding Nagios check script for the specified EC2 instanceid. This value will be passed to the Nagios check script via the warning property (see below for check script properties).
aws.ec2.metricbeat.metric. critical.instanceid.DiskUtilization false aws.ec2.metricbeat. metric.critical.default. DiskUtilization If you have enabled the DiskUtilization metric you can override the default critical alert value to be used by the corresponding Nagios check script for the specified EC2 instanceid. This value will be passed to the Nagios check script via the critical property (see below for check script properties).
aws.ec2.metricbeat.metric. warning.instanceid.FreeMemory false aws.ec2.metricbeat. metric.warning.default. FreeMemory If you have enabled the FreeMemory metric you can override the default warning alert value to be used by the corresponding Nagios check script for the specified EC2 instanceid. This value will be passed to the Nagios check script via the warning property (see below for check script properties).
aws.ec2.metricbeat.metric. critical.instanceid.FreeMemory false aws.ec2.metricbeat. metric.critical.default. FreeMemory If you have enabled the FreeMemory metric you can override the default critical alert value to be used by the corresponding Nagios check script for the specified EC2 instanceid. This value will be passed to the Nagios check script via the critical property (see below for check script properties).
aws.ec2.metricbeat.metric. warning.instanceid.NetworkIn false aws.ec2.metricbeat. metric.warning.default. NetworkIn If you have enabled the NetworkIn metric you can override the default warning alert value to be used by the corresponding Nagios check script for the specified EC2 instanceid. This value will be passed to the Nagios check script via the warning property (see below for check script properties).
aws.ec2.metricbeat.metric. critical.instanceid.NetworkIn false aws.ec2.metricbeat. metric.critical.default. NetworkIn If you have enabled the NetworkIn metric you can override the default critical alert value to be used by the corresponding Nagios check script for the specified EC2 instanceid. This value will be passed to the Nagios check script via the critical property (see below for check script properties).
aws.ec2.metricbeat.metric. warning.instanceid.NetworkOut false aws.ec2.metricbeat. metric.warning.default. NetworkOut If you have enabled the NetworkOut metric you can override the default warning alert value to be used by the corresponding Nagios check script for the specified EC2 instanceid. This value will be passed to the Nagios check script via the warning property (see below for check script properties).
aws.ec2.metricbeat.metric. critical.instanceid.NetworkOut false aws.ec2.metricbeat. metric.critical.default. NetworkOut If you have enabled the NetworkOut metric you can override the default critical alert value to be used by the corresponding Nagios check script for the specified EC2 instanceid. This value will be passed to the Nagios check script via the critical property (see below for check script properties).
aws.ec2.cloudwatch.metrics.fetch false "" Enables certain CloudWatch metrics to be fetched by Candlestack but not necessarily monitored by Nagios. Possible values are CPUUtilization, NetworkIn, and NetworkOut, simply provide a comma separated list of these values if you want multiple enabled.
aws.ec2.cloudwatch. metrics.monitor false "" Enables certain CloudWatch metrics to be monitored by Nagios. It is important that any metrics enabled for monitoring are also enabled for fetching otherwise Nagios alerts will fail due to no data being available for the checks. Possible values are CPUUtilization, NetworkIn, and NetworkOut, simply provide a comma separated list of these values if you want multiple enabled.
aws.ec2.cloudwatch.metric. warning.default.CPUUtilization true N/A If you have enabled the CPUUtilization metric you must provide a default warning alert value to be used by the corresponding Nagios check script. This value will be passed to the Nagios check script via the warning property (see below for check script properties).
aws.ec2.cloudwatch.metric. critical.default.CPUUtilization true N/A If you have enabled the CPUUtilization metric you must provide a default critical alert value to be used by the corresponding Nagios check script. This value will be passed to the Nagios check script via the critical property (see below for check script properties).
aws.ec2.cloudwatch.metric. warning.default.NetworkIn true N/A If you have enabled the NetworkIn metric you must provide a default warning alert value to be used by the corresponding Nagios check script. This value will be passed to the Nagios check script via the warning property (see below for check script properties).
aws.ec2.cloudwatch.metric. critical.default.NetworkIn true N/A If you have enabled the NetworkIn metric you must provide a default critical alert value to be used by the corresponding Nagios check script. This value will be passed to the Nagios check script via the critical property (see below for check script properties).
aws.ec2.cloudwatch.metric. warning.default.NetworkOut true N/A If you have enabled the NetworkOut metric you must provide a default warning alert value to be used by the corresponding Nagios check script. This value will be passed to the Nagios check script via the warning property (see below for check script properties).
aws.ec2.cloudwatch.metric. critical.default.NetworkOut true N/A If you have enabled the NetworkOut metric you must provide a default critical alert value to be used by the corresponding Nagios check script. This value will be passed to the Nagios check script via the critical property (see below for check script properties).
aws.ec2.cloudwatch.metric. warning.instanceid.CPUUtilization false aws.ec2.cloudwatch. metric.warning.default. CPUUtilization If you have enabled the CPUUtilization metric you can override the default warning alert value to be used by the corresponding Nagios check script for the specified EC2 instanceid. This value will be passed to the Nagios check script via the warning property (see below for check script properties).
aws.ec2.cloudwatch.metric. critical.instanceid.CPUUtilization false aws.ec2.cloudwatch. metric.critical.default. CPUUtilization If you have enabled the CPUUtilization metric you can override the default critical alert value to be used by the corresponding Nagios check script for the specified EC2 instanceid. This value will be passed to the Nagios check script via the critical property (see below for check script properties).
aws.ec2.cloudwatch.metric. warning.instanceid.NetworkIn false aws.ec2.cloudwatch. metric.warning.default. NetworkIn If you have enabled the NetworkIn metric you can override the default warning alert value to be used by the corresponding Nagios check script for the specified EC2 instanceid. This value will be passed to the Nagios check script via the warning property (see below for check script properties).
aws.ec2.cloudwatch.metric. critical.instanceid.NetworkIn false aws.ec2.cloudwatch. metric.critical.default. NetworkIn If you have enabled the NetworkIn metric you can override the default critical alert value to be used by the corresponding Nagios check script for the specified EC2 instanceid. This value will be passed to the Nagios check script via the critical property (see below for check script properties).
aws.ec2.cloudwatch.metric. warning.instanceid.NetworkOut false aws.ec2.cloudwatch. metric.warning.default. NetworkOut If you have enabled the NetworkOut metric you can override the default warning alert value to be used by the corresponding Nagios check script for the specified EC2 instanceid. This value will be passed to the Nagios check script via the warning property (see below for check script properties).
aws.ec2.cloudwatch.metric. critical.instanceid.NetworkOut false aws.ec2.cloudwatch. metric.critical.default. NetworkOut If you have enabled the NetworkOut metric you can override the default critical alert value to be used by the corresponding Nagios check script for the specified EC2 instanceid. This value will be passed to the Nagios check script via the critical property (see below for check script properties).
aws.ec2.service.notification.period. instanceid false nagios.contact.default. service.notification. period Allows you to override the notification period for a particular EC2 instance.
aws.eb.enabled false false This flag tells Candlestack whether or not it should monitor your Elastic Beanstalk infrastructure.
aws.eb.environment.name.prefix false "" Currently Candlestack identifies the Elastic Beanstalk environments to monitor via the environment's name. If this property is provided an Elastic Beanstalk environment name must start with the provided prefix string to be monitored.
aws.eb.environment.name.regex false "" Currently Candlestack identifies the Elastic Beanstalk environments to monitor via the environment's name. If this property is provided an Elastic Beanstalk environment name must match the provided regex to be monitored.
aws.eb.metrics.fetcher.sleep.min false 5 The amount of time Candlestack should wait between fetching new metric data for the Elastic Beanstalk environments being monitored. The default value should only be lowered if you have detailed monitoring enabled otherwise you will incur unnecessary CloudWatch API requests.
aws.eb.cloudwatch.metrics.fetch false "" Enables certain CloudWatch metrics to be fetched by Candlestack but not necessarily monitored by Nagios. The only possible value is EnvironmentHealth at this time.
aws.eb.cloudwatch. metrics.monitor false "" Enables certain CloudWatch metrics to be monitored by Nagios. It is important that any metrics enabled for monitoring are also enabled for fetching otherwise Nagios alerts will fail due to no data being available for the checks. The only possible value is EnvironmentHealth at this time.
aws.eb.cloudwatch.metric. warning.default. EnvironmentHealth true N/A If you have enabled the EnvironmentHealth metric you must provide a default warning alert value to be used by the corresponding Nagios check script. This value will be passed to the Nagios check script via the warning property (see below for check script properties).
aws.eb.cloudwatch.metric. critical.default. EnvironmentHealth true N/A If you have enabled the EnvironmentHealth metric you must provide a default critical alert value to be used by the corresponding Nagios check script. This value will be passed to the Nagios check script via the critical property (see below for check script properties).
aws.eb.cloudwatch.metric. warning.environmentname. EnvironmentHealth false aws.eb.cloudwatch. metric.warning.default. EnvironmentHealth If you have enabled the EnvironmentHealth metric you can override the default warning alert value to be used by the corresponding Nagios check script for the specified Elastic Beanstalk environmentname. This value will be passed to the Nagios check script via the warning property (see below for check script properties).
aws.eb.cloudwatch.metric. critical.environmentname. EnvironmentHealth false aws.eb.cloudwatch. metric.critical.default. EnvironmentHealth If you have enabled the EnvironmentHealth metric you can override the default critical alert value to be used by the corresponding Nagios check script for the specified Elastic Beanstalk environmentname. This value will be passed to the Nagios check script via the critical property (see below for check script properties).
aws.eb.service.notification.period. environmentname false nagios.contact.default. service.notification. period Allows you to override the notification period for a particular Elastic Beanstalk environment.
aws.sqs.enabled false false This flag tells Candlestack whether or not it should monitor your SQS infrastructure.
aws.sqs.queue.name.prefix false "" Currently Candlestack identifies the SQS queues to monitor via the queue's name. If this property is provided an SQS queue name must start with the provided prefix string to be monitored.
aws.sqs.queue.name.regex false "" Currently Candlestack identifies the SQS queues to monitor via the queue's name. If this property is provided an SQS queue name must match the provided regex string to be monitored.
aws.sqs.monitor.deadletter false true Flag for whether or not the SQS dead letter queue should be monitored regardless of the queue name prefix or regex properties.
aws.sqs.metrics.fetcher.sleep.min false 5 The amount of time Candlestack should wait between fetching new metric data for the SQS queues being monitored. The default value should only be lowered if you have detailed monitoring enabled otherwise you will incur unnecessary CloudWatch API requests.
aws.sqs.queue.attributes.fetch false "" Enables certain SQS queue attributes to be fetched by Candlestack but not necessarily monitored by Nagios. Possible values are ApproximateNumberOfMessages and LastModifiedTimestamp, simply provide a comma separated list of these values if you want multiple enabled.
aws.sqs.queue.attributes.monitor false "" Enables certain SQS queue attributes to be monitored by Nagios. It is important that any queue attributes enabled for monitoring are also enabled for fetching otherwise Nagios alerts will fail due to no data being available for the checks. Possible values are ApproximateNumberOfMessages and LastModifiedTimestamp, simply provide a comma separated list of these values if you want multiple enabled.
aws.sqs.queue.attribute. warning.default. ApproximateNumberOfMessage true N/A If you have enabled the ApproximateNumberOfMessages attribute you must provide a default warning alert value to be used by the corresponding Nagios check script. This value will be passed to the Nagios check script via the warning property (see below for check script properties).
aws.sqs.queue.attribute. critical.default. ApproximateNumberOfMessage true N/A If you have enabled the ApproximateNumberOfMessages attribute you must provide a default critical alert value to be used by the corresponding Nagios check script. This value will be passed to the Nagios check script via the critical property (see below for check script properties).
aws.sqs.queue.attribute. warning.default. LastModifiedTimestamp true N/A If you have enabled the LastModifiedTimestamp attribute you must provide a default warning alert value to be used by the corresponding Nagios check script. This value will be passed to the Nagios check script via the warning property (see below for check script properties).
aws.sqs.queue.attribute. critical.default. LastModifiedTimestamp true N/A If you have enabled the LastModifiedTimestamp attribute you must provide a default critical alert value to be used by the corresponding Nagios check script. This value will be passed to the Nagios check script via the critical property (see below for check script properties).
aws.sqs.queue.attribute. warning.queuename. ApproximateNumberOfMessage false aws.sqs.queue.attribute. warning.default. ApproximateNumber OfMessage If you have enabled the ApproximateNumberOfMessages attribute you can override the default warning alert value to be used by the corresponding Nagios check script for the specified SQS queuename. This value will be passed to the Nagios check script via the warning property (see below for check script properties).
aws.sqs.queue.attribute. critical.queuename. ApproximateNumberOfMessage false aws.sqs.queue.attribute. critical.default. ApproximateNumber OfMessage If you have enabled the ApproximateNumberOfMessages attribute you can override the default critical alert value to be used by the corresponding Nagios check script for the specified SQS queuename. This value will be passed to the Nagios check script via the critical property (see below for check script properties).
aws.sqs.queue.attribute. warning.queuename. LastModifiedTimestamp false aws.sqs.queue.attribute. warning.default. LastModifiedTimestamp If you have enabled the LastModifiedTimestamp attribute you can override the default warning alert value to be used by the corresponding Nagios check script for the specified SQS queuename. This value will be passed to the Nagios check script via the warning property (see below for check script properties).
aws.sqs.queue.attribute. critical.queuename. LastModifiedTimestamp false aws.sqs.queue.attribute. critical.default. LastModifiedTimestamp If you have enabled the LastModifiedTimestamp attribute you can override the default critical alert value to be used by the corresponding Nagios check script for the specified SQS queuename. This value will be passed to the Nagios check script via the critical property (see below for check script properties).
aws.sqs.cloudwatch.metrics.fetch false "" Enables certain CloudWatch metrics to be fetched by Candlestack but not necessarily monitored by Nagios. Possible values are ApproximateAgeOfOldestMessage, NumberOfMessagesReceived, and NumberOfMessagesSent, simply provide a comma separated list of these values if you want multiple enabled.
aws.sqs.cloudwatch. metrics.monitor false "" Enables certain CloudWatch metrics to be monitored by Nagios. It is important that any metrics enabled for monitoring are also enabled for fetching otherwise Nagios alerts will fail due to no data being available for the checks. Possible values are ApproximateAgeOfOldestMessage, NumberOfMessagesReceived, and NumberOfMessagesSent, simply provide a comma separated list of these values if you want multiple enabled.
aws.sqs.cloudwatch.metric. warning.default. ApproximateAgeOfOldestMessage true N/A If you have enabled the ApproximateAgeOfOldestMessage metric you must provide a default warning alert value to be used by the corresponding Nagios check script. This value will be passed to the Nagios check script via the warning property (see below for check script properties).
aws.sqs.cloudwatch.metric. critical.default. ApproximateAgeOfOldestMessage true N/A If you have enabled the ApproximateAgeOfOldestMessage metric you must provide a default critical alert value to be used by the corresponding Nagios check script. This value will be passed to the Nagios check script via the critical property (see below for check script properties).
aws.sqs.cloudwatch.metric. warning.default. NumberOfMessagesReceived true N/A If you have enabled the NumberOfMessagesReceived metric you must provide a default warning alert value to be used by the corresponding Nagios check script. This value will be passed to the Nagios check script via the warning property (see below for check script properties).
aws.sqs.cloudwatch.metric. critical.default. NumberOfMessagesReceived true N/A If you have enabled the NumberOfMessagesReceived metric you must provide a default critical alert value to be used by the corresponding Nagios check script. This value will be passed to the Nagios check script via the critical property (see below for check script properties).
aws.sqs.cloudwatch.metric. warning.default. NumberOfMessagesSent true N/A If you have enabled the NumberOfMessagesSent metric you must provide a default warning alert value to be used by the corresponding Nagios check script. This value will be passed to the Nagios check script via the warning property (see below for check script properties).
aws.sqs.cloudwatch.metric. critical.default. NumberOfMessagesSent true N/A If you have enabled the NumberOfMessagesSent metric you must provide a default critical alert value to be used by the corresponding Nagios check script. This value will be passed to the Nagios check script via the critical property (see below for check script properties).
aws.sqs.cloudwatch.metric. warning.queuename. ApproximateAge OfOldestMessage false aws.sqs.cloudwatch. metric.warning.default. ApproximateAge OfOldestMessage If you have enabled the ApproximateAgeOfOldestMessage metric you can override the default warning alert value to be used by the corresponding Nagios check script for the specified SQS queuename. This value will be passed to the Nagios check script via the warning property (see below for check script properties).
aws.sqs.cloudwatch.metric. critical.queuename. ApproximateAgeOfOldestMessage false aws.sqs.cloudwatch. metric.critical.default. ApproximateAge OfOldestMessage If you have enabled the ApproximateAgeOfOldestMessage metric you can override the default critical alert value to be used by the corresponding Nagios check script for the specified SQS queuename. This value will be passed to the Nagios check script via the critical property (see below for check script properties).
aws.sqs.cloudwatch.metric. warning.queuename. NumberOfMessagesReceived false aws.sqs.cloudwatch. metric.warning.default. NumberOf MessagesReceived If you have enabled the NumberOfMessagesReceived metric you can override the default warning alert value to be used by the corresponding Nagios check script for the specified SQS queuename. This value will be passed to the Nagios check script via the warning property (see below for check script properties).
aws.sqs.cloudwatch.metric. critical.queuename. NumberOfMessagesReceived false aws.sqs.cloudwatch. metric.critical.default. NumberOf MessagesReceived If you have enabled the NumberOfMessagesReceived metric you can override the default critical alert value to be used by the corresponding Nagios check script for the specified SQS queuename. This value will be passed to the Nagios check script via the critical property (see below for check script properties).
aws.sqs.cloudwatch.metric. warning.queuename. NumberOfMessagesSent false aws.sqs.cloudwatch. metric.warning.default. NumberOfMessagesSent If you have enabled the NumberOfMessagesSent metric you can override the default warning alert value to be used by the corresponding Nagios check script for the specified SQS queuename. This value will be passed to the Nagios check script via the warning property (see below for check script properties).
aws.sqs.cloudwatch.metric. critical.queuename. NumberOfMessagesSent false aws.sqs.cloudwatch. metric.critical.default. NumberOfMessagesSent If you have enabled the NumberOfMessagesSent metric you can override the default critical alert value to be used by the corresponding Nagios check script for the specified SQS queuename. This value will be passed to the Nagios check script via the critical property (see below for check script properties).
aws.sqs.service.notification.period. queuename false nagios.contact.default. service.notification. period Allows you to override the notification period for a particular SQS queue.
aws.rds.enabled false false This flag tells Candlestack whether or not it should monitor your RDS infrastructure.
aws.rds.dbinstance.prefix false "" Currently Candlestack identifies the RDS database instances to monitor via the database instance's name. If this property is provided an RDS database instance name must start with the provided prefix string to be monitored.
aws.rds.dbinstance.regex false "" Currently Candlestack identifies the RDS database instances to monitor via the database instance's name. If this property is provided an RDS database instance name must match the provided regex string to be monitored.
aws.rds.metrics.fetcher.sleep.min false 5 The amount of time Candlestack should wait between fetching new metric data for the RDS database instances being monitored. The default value should only be lowered if you have detailed monitoring enabled otherwise you will incur unnecessary CloudWatch API requests.
aws.rds.cloudwatch.metrics.fetch false "" Enables certain CloudWatch metrics to be fetched by Candlestack but not necessarily monitored by Nagios. Possible values are CPUUtilization, DatabaseConnections, FreeStorageSpace, VolumeBytesUsed, AuroraReplicaLag, and ActiveTransactions, simply provide a comma separated list of these values if you want multiple enabled.
aws.rds.cloudwatch. metrics.monitor false "" Enables certain CloudWatch metrics to be monitored by Nagios. It is important that any metrics enabled for monitoring are also enabled for fetching otherwise Nagios alerts will fail due to no data being available for the checks. Possible values are CPUUtilization, DatabaseConnections, FreeStorageSpace, VolumeBytesUsed, AuroraReplicaLag, and ActiveTransactions, simply provide a comma separated list of these values if you want multiple enabled.
aws.rds.cloudwatch.metric. warning.default.CPUUtilization true N/A If you have enabled the CPUUtilization metric you must provide a default warning alert value to be used by the corresponding Nagios check script. This value will be passed to the Nagios check script via the warning property (see below for check script properties).
aws.rds.cloudwatch.metric. critical.default.CPUUtilization true N/A If you have enabled the CPUUtilization metric you must provide a default critical alert value to be used by the corresponding Nagios check script. This value will be passed to the Nagios check script via the critical property (see below for check script properties).
aws.rds.cloudwatch.metric. warning.default. DatabaseConnections true N/A If you have enabled the DatabaseConnections metric you must provide a default warning alert value to be used by the corresponding Nagios check script. This value will be passed to the Nagios check script via the warning property (see below for check script properties).
aws.rds.cloudwatch.metric. critical.default. DatabaseConnections true N/A If you have enabled the DatabaseConnections metric you must provide a default critical alert value to be used by the corresponding Nagios check script. This value will be passed to the Nagios check script via the critical property (see below for check script properties).
aws.rds.cloudwatch.metric. warning.default. FreeStorageSpace true N/A If you have enabled the FreeStorageSpace metric you must provide a default warning alert value to be used by the corresponding Nagios check script. This value will be passed to the Nagios check script via the warning property (see below for check script properties).
aws.rds.cloudwatch.metric. critical.default.FreeStorageSpace true N/A If you have enabled the FreeStorageSpace metric you must provide a default critical alert value to be used by the corresponding Nagios check script. This value will be passed to the Nagios check script via the critical property (see below for check script properties).
aws.rds.cloudwatch.metric. warning.default.VolumeBytesUsed true N/A If you have enabled the VolumeBytesUsed metric you must provide a default warning alert value to be used by the corresponding Nagios check script. This value will be passed to the Nagios check script via the warning property (see below for check script properties).
aws.rds.cloudwatch.metric. critical.default.VolumeBytesUsed true N/A If you have enabled the VolumeBytesUsed metric you must provide a default critical alert value to be used by the corresponding Nagios check script. This value will be passed to the Nagios check script via the critical property (see below for check script properties).
aws.rds.cloudwatch.metric. warning.default.AuroraReplicaLag true N/A If you have enabled the AuroraReplicaLag metric you must provide a default warning alert value to be used by the corresponding Nagios check script. This value will be passed to the Nagios check script via the warning property (see below for check script properties).
aws.rds.cloudwatch.metric. critical.default.AuroraReplicaLag true N/A If you have enabled the AuroraReplicaLag metric you must provide a default critical alert value to be used by the corresponding Nagios check script. This value will be passed to the Nagios check script via the critical property (see below for check script properties).
aws.rds.cloudwatch.metric. warning.default. ActiveTransactions true N/A If you have enabled the ActiveTransactions metric you must provide a default warning alert value to be used by the corresponding Nagios check script. This value will be passed to the Nagios check script via the warning property (see below for check script properties).
aws.rds.cloudwatch.metric. critical.default.ActiveTransactions true N/A If you have enabled the ActiveTransactions metric you must provide a default critical alert value to be used by the corresponding Nagios check script. This value will be passed to the Nagios check script via the critical property (see below for check script properties).
aws.rds.cloudwatch.metric. warning.dbinstance.CPUUtilization false aws.rds.cloudwatch. metric.warning.default. CPUUtilization If you have enabled the CPUUtilization metric you can override the default warning alert value to be used by the corresponding Nagios check script for the specified RDS dbinstance. This value will be passed to the Nagios check script via the warning property (see below for check script properties).
aws.rds.cloudwatch.metric. critical.dbinstance.CPUUtilization false aws.rds.cloudwatch. metric.critical.default. CPUUtilization If you have enabled the CPUUtilization metric you can override the default critical alert value to be used by the corresponding Nagios check script for the specified RDS dbinstance. This value will be passed to the Nagios check script via the critical property (see below for check script properties).
aws.rds.cloudwatch.metric. warning.dbinstance. DatabaseConnections false aws.rds.cloudwatch. metric.warning.default. DatabaseConnections If you have enabled the DatabaseConnections metric you can override the default warning alert value to be used by the corresponding Nagios check script for the specified RDS dbinstance. This value will be passed to the Nagios check script via the warning property (see below for check script properties).
aws.rds.cloudwatch.metric. critical.dbinstance. DatabaseConnections false aws.rds.cloudwatch. metric.critical.default. DatabaseConnections If you have enabled the DatabaseConnections metric you can override the default critical alert value to be used by the corresponding Nagios check script for the specified RDS dbinstance. This value will be passed to the Nagios check script via the critical property (see below for check script properties).
aws.rds.cloudwatch.metric. warning.dbinstance. FreeStorageSpace false aws.rds.cloudwatch. metric.warning.default. FreeStorageSpace If you have enabled the FreeStorageSpace metric you can override the default warning alert value to be used by the corresponding Nagios check script for the specified RDS dbinstance. This value will be passed to the Nagios check script via the warning property (see below for check script properties).
aws.rds.cloudwatch.metric. critical.dbinstance. FreeStorageSpace false aws.rds.cloudwatch. metric.critical.default. FreeStorageSpace If you have enabled the FreeStorageSpace metric you can override the default critical alert value to be used by the corresponding Nagios check script for the specified RDS dbinstance. This value will be passed to the Nagios check script via the critical property (see below for check script properties).
aws.rds.cloudwatch.metric. warning.dbinstance. VolumeBytesUsed false aws.rds.cloudwatch. metric.warning.default. VolumeBytesUsed If you have enabled the VolumeBytesUsed metric you can override the default warning alert value to be used by the corresponding Nagios check script for the specified RDS dbinstance. This value will be passed to the Nagios check script via the warning property (see below for check script properties).
aws.rds.cloudwatch.metric. critical.dbinstance. VolumeBytesUsed false aws.rds.cloudwatch. metric.critical.default. VolumeBytesUsed If you have enabled the VolumeBytesUsed metric you can override the default critical alert value to be used by the corresponding Nagios check script for the specified RDS dbinstance. This value will be passed to the Nagios check script via the critical property (see below for check script properties).
aws.rds.cloudwatch.metric. warning.dbinstance. AuroraReplicaLag false aws.rds.cloudwatch. metric.warning.default. AuroraReplicaLag If you have enabled the AuroraReplicaLag metric you can override the default warning alert value to be used by the corresponding Nagios check script for the specified RDS dbinstance. This value will be passed to the Nagios check script via the warning property (see below for check script properties).
aws.rds.cloudwatch.metric. critical.dbinstance. AuroraReplicaLag false aws.rds.cloudwatch. metric.critical.default. AuroraReplicaLag If you have enabled the AuroraReplicaLag metric you can override the default critical alert value to be used by the corresponding Nagios check script for the specified RDS dbinstance. This value will be passed to the Nagios check script via the critical property (see below for check script properties).
aws.rds.cloudwatch.metric. warning.dbinstance. ActiveTransactions false aws.rds.cloudwatch. metric.warning.default. ActiveTransactions If you have enabled the ActiveTransactions metric you can override the default warning alert value to be used by the corresponding Nagios check script for the specified RDS dbinstance. This value will be passed to the Nagios check script via the warning property (see below for check script properties).
aws.rds.cloudwatch.metric. critical.dbinstance. ActiveTransactions false aws.rds.cloudwatch. metric.critical.default. ActiveTransactions If you have enabled the ActiveTransactions metric you can override the default critical alert value to be used by the corresponding Nagios check script for the specified RDS dbinstance. This value will be passed to the Nagios check script via the critical property (see below for check script properties).
aws.rds.service.notification.period. dbinstance false nagios.contact.default. service.notification. period Allows you to override the notification period for a particular RDS database.
aws.s3.enabled false false This flag tells Candlestack whether or not it should monitor files residing on S3.
aws.s3.metrics.fetcher.sleep.min false 5 The amount of time Candlestack should wait between fetching new metric data for the S3 files being monitored.
aws.s3.locations false "" This specified the S3 file locations that should be monitored by Candlestack. The value for this property is a JSON string using the following format: [{"id":"","name":"","bucket":"","key":""},...]
aws.s3.metadata.metrics.fetch false "" Enables certain S3 metadata metrics to be fetched by Candlestack but not necessarily monitored by Nagios. The only possible value is LastModified at this time.
aws.s3.metadata.metrics.monitor false "" Enables certain S3 metadata metrics to be monitored by Nagios. It is important that any metrics enabled for monitoring are also enabled for fetching otherwise Nagios alerts will fail due to no data being available for the checks. The only possible value is LastModified at this time.
aws.s3.metadata.metric. warning.default.LastModified true N/A If you have enabled the LastModified metric you must provide a default warning alert value to be used by the corresponding Nagios check script. This value will be passed to the Nagios check script via the warning property (see below for check script properties).
aws.s3.metadata.metric. critical.default.LastModified true N/A If you have enabled the LastModified metric you must provide a default critical alert value to be used by the corresponding Nagios check script. This value will be passed to the Nagios check script via the critical property (see below for check script properties).
aws.s3.metadata.metric. warning.locationid.LastModified false aws.s3.metadata. metric.warning.default. LastModified If you have enabled the LastModified metric you can override the default warning alert value to be used by the corresponding Nagios check script for the specified S3 locationid. This value will be passed to the Nagios check script via the warning property (see below for check script properties).
aws.s3.metadata.metric. critical.locationid.LastModified false aws.s3.metadata. metric.critical.default. LastModified If you have enabled the LastModified metric you can override the default critical alert value to be used by the corresponding Nagios check script for the specified S3 locationid. This value will be passed to the Nagios check script via the critical property (see below for check script properties).
aws.s3.service.notification.period. locationid false nagios.contact.default. service.notification. period Allows you to override the notification period for a particular S3 location.

Nagios Check Scripts

Depending on the metrics you have enabled for monitoring via the Candlestack Java application configuration you will need to provide a corresponding Nagios check script. Example check scripts can be found here and in most cases can be used by your application with very little to no modifications. Below you will find a table that outlines the parameters a check script will always receive and another table that maps the monitor metric to script file name.

Check Script Properties

Order Number Property Name Description
1 host The Elasticsearch host containing the metric data to be checked
2 authtoken The authtoken to use when accessing Elasticsearch
3 instanceid The particular "id" that identifies the specific instance of a resource that is being checked (an example would be for EC2 instance check this would be the EC2 instance id)
4 warning The value that has been provided as the warning level for metric checks
5 critical The value that has been provided as the critical level for metric checks

Monitor Metric to Check Script Mapping

Monitor Metric Check Script File Name
Elastic Beanstalk Environment Health check-aws-eb-environment-health-via-es.sh
EC2 CPU Utilization check-aws-ec2-cpu-via-es-cw.sh and/or check-aws-ec2-cpu-via-es-mb.sh depending on if using CloudWatch and/or Metricbeat
EC2 Disk Utilization check-aws-ec2-disk-utilization-via-es-mb.sh
EC2 Free Memory check-aws-ec2-free-memory-via-es-mb.sh
EC2 Network In check-aws-ec2-network-in-via-es-cw.sh and/or check-aws-ec2-network-in-via-es-mb.sh depending on if using CloudWatch and/or Metricbeat
EC2 Network Out check-aws-ec2-network-out-via-es-cw.sh and/or check-aws-ec2-network-out-via-es-mb.sh depending on if using CloudWatch and/or Metricbeat
RDS Active Transactions check-aws-rds-active-transactions-via-es.sh
RDS CPU Utilization check-aws-rds-cpu-via-es.sh
RDS Database Connections check-aws-rds-db-connections-via-es.sh
RDS Free Storage check-aws-rds-free-storage-via-es.sh
RDS Aurora Replica Lag check-aws-rds-replica-lag-via-es.sh
RDS Storage Used check-aws-rds-storage-used-via-es.sh
S3 Last Modified check-aws-s3-last-modified-via-es.sh
SQS Last Modified check-aws-sqs-queue-last-modified-via-es.sh
SQS Message Age check-aws-sqs-queue-message-age-via-es.sh
SQS Messages Received check-aws-sqs-queue-messages-received-via-es.sh
SQS Messages Sent check-aws-sqs-queue-messages-sent-via-es.sh
SQS Queue Size check-aws-sqs-queue-size-via-es.sh

About

A monitoring service that shines some light on your AWS stack

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •