Skip to content

Latest commit

 

History

History

hdfs_namenode

HDFS NameNode Integration

HDFS Dashboard

Overview

Monitor your primary and standby HDFS NameNodes to know when your cluster falls into a precarious state: when you're down to one NameNode remaining, or when it's time to add more capacity to the cluster. This Agent check collects metrics for remaining capacity, corrupt/missing blocks, dead DataNodes, filesystem load, under-replicated blocks, total volume failures (across all DataNodes), and many more.

Use this check (hdfs_namenode) and its counterpart check (hdfs_datanode), not the older two-in-one check (hdfs); that check is deprecated.

Setup

Follow the instructions below to install and configure this check for an Agent running on a host. For containerized environments, see the Autodiscovery Integration Templates for guidance on applying these instructions.

Installation

The HDFS NameNode check is included in the Datadog Agent package, so you don't need to install anything else on your NameNodes.

Configuration

Prepare the NameNode

The Agent collects metrics from the NameNode's JMX remote interface. The interface is disabled by default, so enable it by setting the following option in hadoop-env.sh (usually found in $HADOOP_HOME/conf):

export HADOOP_NAMENODE_OPTS="-Dcom.sun.management.jmxremote
  -Dcom.sun.management.jmxremote.authenticate=false
  -Dcom.sun.management.jmxremote.ssl=false
  -Dcom.sun.management.jmxremote.port=50070 $HADOOP_NAMENODE_OPTS"

Restart the NameNode process to enable the JMX interface.

Connect the Agent

Edit the hdfs_namenode.d/conf.yaml file, in the conf.d/ folder at the root of your Agent's configuration directory. See the sample hdfs_namenode.d/conf.yaml for all available configuration options:

init_config:

instances:
  - hdfs_namenode_jmx_uri: http://localhost:50070

Restart the Agent to begin sending NameNode metrics to Datadog.

Validation

Run the Agent's status subcommand and look for hdfs_namenode under the Checks section.

Data Collected

Metrics

See metadata.csv for a list of metrics provided by this integration.

Events

The HDFS-namenode check does not include any events.

Service Checks

hdfs.namenode.jmx.can_connect:

Returns Critical if the Agent cannot connect to the NameNode's JMX interface for any reason (e.g. wrong port provided, timeout, un-parseable JSON response).

Troubleshooting

Need help? Contact Datadog support.

Further Reading