Ambari uses nagios to manage alerts in the cluster. When a node goes down or a service state changes in the cluster, nagios will handle those events and is monitored using Ambari-web. This document describes how to integrate alerts with remote SNMP management station by sending SNMP traps. By enabling SNMP traps, Ambari & Hadoop cluster alerts can be monitored using remote management station (like OpenNMS, HP OpenView, etc.,).
This will work with Ambari Server 1.7.0 and below. In Ambari 2.0.0, this feature will be
replaced by alert framework.
- Nagios server should be running in one of the hadoop cluster node. (Need not to be same node as Ambari server).
- SNMP should be installed in the node where nagios server is running. Run the following command to install net-snmp and net-snmp-utils.
yum install net-snmp net-snmp-utils net-snmp-devel
- There should be connectivity between the hadoop node running nagios server and the management station. The snmptrap command will use 162/udp to send trap to the management station.
-
Copy the file src/nagios/objects/snmp-commands.cfg to {nagios_home_dir}/objects/snmp-commands.cfg in the node where nagios is running. This file defines the command to send traps for service and host failures.
The default home directory (nagios_home_dir) for nagios is /etc/nagios
-
Copy the file src/nagios/objects/snmp-contacts.cfg to {nagios_home_dir}/objects/snmp-contacts.cfg in the node where nagios is running. This file defines the snmp-management-station contact.
-
In the node where ambari-server is running, edit file /var/lib/ambari-server/resources/stacks/HDP/2.0.6/services/NAGIOS/package/templates/nagios.cfg.j2 and add below lines just before the {{nagios_host_cfg}}
#Definitions for SNMP traps cfg_file=/etc/nagios/objects/snmp-commands.cfg cfg_file=/etc/nagios/objects/snmp-contacts.cfg
Note: If the home directory is different than /etc/nagios, use the updated home directory. The updated configuration will be automatically pushed to the nagios server when ambari-server restarted.
-
To enable SNMP trap, edit file /var/lib/ambari-server/resources/stacks/HDP/2.0.6/services/NAGIOS/package/templates/contacts.cfg.j2 in the ambari-server and add snmp-management-station to the contract group admins
members {{nagios_web_login}},sys_logger,snmp-management-station
-
Copy the file src/scripts/send-service-trap to /usr/local/bin/send-service-trap in the node where nagios is running. Also, run the following command
chmod +x /usr/local/bin/send-service-trap
chown nagios:nagios /usr/local/bin/send-service-trap
-
Copy the file src/scripts/send-host-trap to /usr/local/bin/send-host-trap in the node where nagios is running. Also, run the following command
chmod +x /usr/local/bin/send-service-trap
chown nagios:nagios /usr/local/bin/send-service-trap
-
Download nagios MIBS from http://ftp.cc.uoc.gr/mirrors/monitoring-plugins/mib/nagiosmib-1.0.0.tar.gz and extract the files to /usr/share/snmp/mibs/ directory.
-
Restart ambari-server
ambari-server restart
-
Launch ambari-web (or GUI) in the browser and login. Select Nagios server and restart the service.
-
Configure management station by editing file /etc/hosts and add the below line
<MGMT_STATION_IP> snmp-manager
-
For integrating with existing management station or NMS system,
- Download the nagios MIB's from http://ftp.cc.uoc.gr/mirrors/monitoring-plugins/mib/nagiosmib-1.0.0.tar.gz
- Extract and copy the files under MIB directory of the management station's (or NMS) mib directory.
- Import the mibs if required.
To test whether the snmptraps are triggered, use the following procedure.
- Load the MIB in the snmp management system.
- Make sure the snmp management system IP (or FQDN) is configured in the /etc/hosts file in the node where nagios server is running.
- Open the ambari-web in the browser and login. Try to stop some services from ambari-web and check the snmptraps are received by the snmp management station.