RegionsMerger is an utility tool for manually merging bunch of regions of a given table. It's mainly useful on situations when an HBase cluster has too many regions per RegionServers, and many of these regions are small enough that it can be merged together, reducing the total number of regions in the cluster and releasing RegionServers overall memory resources.
This may happen for mistakenly pre-splits, or after a purge in table data, as regions would not be automatically merged.
Make sure HBase tools jar is added to HBase classpath:
export HBASE_CLASSPATH=$HBASE_CLASSPATH:./hbase-tools-1.1.0-SNAPSHOT.jar
RegionsMerger requires two arguments as parameters: 1) The name of the table
to have regions merged; 2) The desired total number of regions for the informed
table. For example, to merge all regions of table my-table
until it gets to a
total of 5 regions, assuming the setup step above has been performed:
$ hbase org.apache.hbase.RegionsMerger my-table 5
RegionsMerger uses client API org.apache.hadoop.hbase.client.Admin.getRegions to fetch the list of regions for the specified table, iterates through the resulting list, identifying pairs of adjacent regions. For each pair found, it submits a merge request using org.apache.hadoop.hbase.client.Admin.mergeRegionsAsync client API method. This means multiple merge requests had been sent once the whole list has been iterated.
Assuming that all merges issued by the RegionsMerger are successful, the resulting number of regions will be no more than half the original number of regions. This resulting total might not be equal to the target value passed as parameter, in which case RegionsMerger will perform another round of merge requests, this time over the current existing regions (it fetches another list of regions from org.apache.hadoop.hbase.client.Admin.getRegions).
Merge requests are processed asynchronously. HBase may take a certain time to
complete some merge requests, so RegionsMerger may perform some sleep between
rounds of regions iteration for sending requests. The specific amount of time is
configured by hbase.tools.merge.sleep
property, in milliseconds, and it
defaults to 2000
(2 seconds).
While iterating through the list of regions, once a pair of adjacent regions is
detected, RegionsMerger checks the current file system size of each region (excluding MOB data),
before deciding to submit the merge request for the given regions. If the sum of
both regions size exceeds a threshold, merge will not be attempted.
This threshold is a configurable percentage of hbase.hregion.max.filesize
value, and is applied to avoid merged regions from getting immediately split
after the merge completes, which would happen automatically if the resulting
region size reaches hbase.hregion.max.filesize
value. The percentage of
hbase.hregion.max.filesize
is a double value configurable via
hbase.tools.merge.upper.mark
property and it defaults to 0.9
.
Given this hbase.hregion.max.filesize
restriction for merge results, it may be
impossible to achieve the desired total number of regions.
RegionsMerger keeps tracking the progress of regions merges, on each round.
If no progress is observed after a configurable amount of rounds,
RegionsMerger aborts automatically. The limit of rounds without progress is an
integer value configured via hbase.tools.max.iterations.blocked
property.