@@ -1725,6 +1725,100 @@ hbase.regionserver.authenticationFailures::
1725
1725
hbase.regionserver.mutationsWithoutWALCount ::
1726
1726
Count of writes submitted with a flag indicating they should bypass the write ahead log
1727
1727
1728
+ [[rs_meta_metrics]]
1729
+ === Meta Table Load Metrics
1730
+
1731
+ HBase meta table metrics collection feature is available in HBase 1.4+ but it is disabled by default, as it can
1732
+ affect the performance of the cluster. When it is enabled, it helps to monitor client access patterns by collecting
1733
+ the following statistics:
1734
+
1735
+ * number of get, put and delete operations on the `hbase:meta` table
1736
+ * number of get, put and delete operations made by the top-N clients
1737
+ * number of operations related to each table
1738
+ * number of operations related to the top-N regions
1739
+
1740
+
1741
+ When to use the feature::
1742
+ This feature can help to identify hot spots in the meta table by showing the regions or tables where the meta info is
1743
+ modified (e.g. by create, drop, split or move tables) or retrieved most frequently. It can also help to find misbehaving
1744
+ client applications by showing which clients are using the meta table most heavily, which can for example suggest the
1745
+ lack of meta table buffering or the lack of re-using open client connections in the client application.
1746
+
1747
+ .Possible side-effects of enabling this feature
1748
+ [WARNING]
1749
+ ====
1750
+ Having large number of clients and regions in the cluster can cause the registration and tracking of a large amount of
1751
+ metrics, which can increase the memory and CPU footprint of the HBase region server handling the `hbase:meta` table.
1752
+ It can also cause the significant increase of the JMX dump size, which can affect the monitoring or log aggregation
1753
+ system you use beside HBase. It is recommended to turn on this feature only during debugging.
1754
+ ====
1755
+
1756
+ Where to find the metrics in JMX::
1757
+ Each metric attribute name will start with the ‘MetaTable_’ prefix. For all the metrics you will see five different
1758
+ JMX attributes: count, mean rate, 1 minute rate, 5 minute rate and 15 minute rate. You will find these metrics in JMX
1759
+ under the following MBean:
1760
+ `Hadoop -> HBase -> RegionServer -> Coprocessor.Region.CP_org.apache.hadoop.hbase.coprocessor.MetaTableMetrics`.
1761
+
1762
+ .Examples: some Meta Table metrics you can see in your JMX dump
1763
+ [source,json]
1764
+ ----
1765
+ {
1766
+ "MetaTable_get_request_count": 77309,
1767
+ "MetaTable_put_request_mean_rate": 0.06339092997186495,
1768
+ "MetaTable_table_MyTestTable_request_15min_rate": 1.1020599841623246,
1769
+ "MetaTable_client_/172.30.65.42_lossy_request_count": 1786
1770
+ "MetaTable_client_/172.30.65.45_put_request_5min_rate": 0.6189810954855728,
1771
+ "MetaTable_region_1561131112259.c66e4308d492936179352c80432ccfe0._lossy_request_count": 38342,
1772
+ "MetaTable_region_1561131043640.5bdffe4b9e7e334172065c853cf0caa6._lossy_request_1min_rate": 0.04925099917433935,
1773
+ }
1774
+ ----
1775
+
1776
+ Configuration::
1777
+ To turn on this feature, you have to enable a custom coprocessor by adding the following section to hbase-site.xml.
1778
+ This coprocessor will run on all the HBase RegionServers, but will be active (i.e. consume memory / CPU) only on
1779
+ the server, where the `hbase:meta` table is located. It will produce JMX metrics which can be downloaded from the
1780
+ web UI of the given RegionServer or by a simple REST call. These metrics will not be present in the JMX dump of the
1781
+ other RegionServers.
1782
+
1783
+ .Enabling the Meta Table Metrics feature
1784
+ [source,xml]
1785
+ ----
1786
+ <property>
1787
+ <name>hbase.coprocessor.region.classes</name>
1788
+ <value>org.apache.hadoop.hbase.coprocessor.MetaTableMetrics</value>
1789
+ </property>
1790
+ ----
1791
+
1792
+ .How the top-N metrics are calculated?
1793
+ [NOTE]
1794
+ ====
1795
+ The 'top-N' type of metrics will be counted using the Lossy Counting Algorithm (as defined in
1796
+ link:http://www.vldb.org/conf/2002/S10P03.pdf[Motwani, R; Manku, G.S (2002). "Approximate frequency counts over data streams"]),
1797
+ which is designed to identify elements in a data stream whose frequency count exceed a user-given threshold.
1798
+ The frequency computed by this algorithm is not always accurate but has an error threshold that can be specified by the
1799
+ user as a configuration parameter. The run time space required by the algorithm is inversely proportional to the
1800
+ specified error threshold, hence larger the error parameter, the smaller the footprint and the less accurate are the
1801
+ metrics.
1802
+
1803
+ You can specify the error rate of the algorithm as a floating-point value between 0 and 1 (exclusive), it's default
1804
+ value is 0.02. Having the error rate set to `E` and having `N` as the total number of meta table operations, then
1805
+ (assuming the uniform distribution of the activity of low frequency elements) at most `7 / E` meters will be kept and
1806
+ each kept element will have a frequency higher than `E * N`.
1807
+
1808
+ An example: Let’s assume we are interested in the HBase clients that are most active in accessing the meta table.
1809
+ When there was 1,000,000 operations on the meta table so far and the error rate parameter is set to 0.02, then we can
1810
+ assume that only at most 350 client IP address related counters will be present in JMX and each of these clients
1811
+ accessed the meta table at least 20,000 times.
1812
+
1813
+ [source,xml]
1814
+ ----
1815
+ <property>
1816
+ <name>hbase.util.default.lossycounting.errorrate</name>
1817
+ <value>0.02</value>
1818
+ </property>
1819
+ ----
1820
+ ====
1821
+
1728
1822
[[ops.monitoring]]
1729
1823
== HBase Monitoring
1730
1824
0 commit comments