Skip to content

Commit c9c164e

Browse files
committed
HBASE-24106 Update getting started documentation after HBASE-24086
1 parent b2c9a06 commit c9c164e

File tree

1 file changed

+51
-71
lines changed

1 file changed

+51
-71
lines changed

src/main/asciidoc/_chapters/getting_started.adoc

Lines changed: 51 additions & 71 deletions
Original file line numberDiff line numberDiff line change
@@ -55,35 +55,43 @@ See <<java,Java>> for information about supported JDK versions.
5555
. Choose a download site from this list of link:https://www.apache.org/dyn/closer.lua/hbase/[Apache Download Mirrors].
5656
Click on the suggested top link.
5757
This will take you to a mirror of _HBase Releases_.
58-
Click on the folder named _stable_ and then download the binary file that ends in _.tar.gz_ to your local filesystem.
59-
Do not download the file ending in _src.tar.gz_ for now.
58+
Click on the folder named _stable_ and then download the binary file that looks like
59+
_hbase-{version}-bin.tar.gz_ to your local filesystem.
6060
6161
. Extract the downloaded file, and change to the newly-created directory.
6262
+
6363
[source,subs="attributes"]
6464
----
6565
66-
$ tar xzvf hbase-{Version}-bin.tar.gz
67-
$ cd hbase-{Version}/
66+
$ tar xzvf hbase-{version}-bin.tar.gz
67+
$ cd hbase-{version}/
6868
----
6969
70-
. You must set the `JAVA_HOME` environment variable before starting HBase.
71-
To make this easier, HBase lets you set it within the _conf/hbase-env.sh_ file. You must locate where Java is
72-
installed on your machine, and one way to find this is by using the _whereis java_ command. Once you have the location,
73-
edit the _conf/hbase-env.sh_ file and uncomment the line starting with _#export JAVA_HOME=_, and then set it to your Java installation path.
70+
. Set the `JAVA_HOME` environment variable in _conf/hbase-env.sh_.
71+
First, locate the installation of `java` on your machine. On Unix systems, you can use the
72+
_whereis java_ command. Once you have the location, edit _conf/hbase-env.sh_ file, found inside
73+
the extracted _hbase-{version}_ directory, uncomment the line starting with `#export JAVA_HOME=`,
74+
and then set it to your Java installation path.
7475
+
75-
.Example extract from _hbase-env.sh_ where _JAVA_HOME_ is set
76+
.Example extract from _conf/hbase-env.sh_ where `JAVA_HOME` is set
7677
# Set environment variables here.
7778
# The java implementation to use.
7879
export JAVA_HOME=/usr/jdk64/jdk1.8.0_112
7980
+
8081
81-
. Edit _conf/hbase-site.xml_, which is the main HBase configuration file.
82-
At this time, you need to specify the directory on the local filesystem where HBase and ZooKeeper write data and acknowledge some risks.
83-
By default, a new directory is created under /tmp.
84-
Many servers are configured to delete the contents of _/tmp_ upon reboot, so you should store the data elsewhere.
85-
The following configuration will store HBase's data in the _hbase_ directory, in the home directory of the user called `testuser`.
86-
Paste the `<property>` tags beneath the `<configuration>` tags, which should be empty in a new HBase install.
82+
. Optionally set the <<hbase.tmp.dir,`hbase.tmp.dir`>> property in _conf/hbase-site.xml_.
83+
At this time, you may consider changing the location on the local filesystem where HBase writes
84+
its application data and the data written by its embedded ZooKeeper instance. By default, HBase
85+
uses paths under <<hbase.tmp.dir,`hbase.tmp.dir`>> for these directories.
86+
+
87+
NOTE: On most systems, this is a path created under _/tmp_. Many system periodically delete the
88+
contents of _/tmp_. If you start working with HBase in this way, and then return after the
89+
cleanup operation takes place, you're likely to find strange errors. Thus, the following
90+
configuration will place HBase's runtime data in a _tmp_ directory found inside the extracted
91+
_hbase-{version}_ directory.
92+
+
93+
Open _conf/hbase-site.xml` and paste the `<property>` tags between the empty `<configuration>`
94+
tags.
8795
+
8896
.Example _hbase-site.xml_ for Standalone HBase
8997
====
@@ -92,48 +100,25 @@ $ cd hbase-{Version}/
92100
93101
<configuration>
94102
<property>
95-
<name>hbase.rootdir</name>
96-
<value>file:///home/testuser/hbase</value>
97-
</property>
98-
<property>
99-
<name>hbase.zookeeper.property.dataDir</name>
100-
<value>/home/testuser/zookeeper</value>
101-
</property>
102-
<property>
103-
<name>hbase.unsafe.stream.capability.enforce</name>
104-
<value>false</value>
105-
<description>
106-
Controls whether HBase will check for stream capabilities (hflush/hsync).
107-
108-
Disable this if you intend to run on LocalFileSystem, denoted by a rootdir
109-
with the 'file://' scheme, but be mindful of the NOTE below.
110-
111-
WARNING: Setting this to false blinds you to potential data loss and
112-
inconsistent system state in the event of process and/or node failures. If
113-
HBase is complaining of an inability to use hsync or hflush it's most
114-
likely not a false positive.
115-
</description>
103+
<name>hbase.tmp.dir</name>
104+
<value>tmp</value>
116105
</property>
117106
</configuration>
118107
----
119108
====
120109
+
121-
You do not need to create the HBase data directory.
122-
HBase will do this for you. If you create the directory,
123-
HBase will attempt to do a migration, which is not what you want.
110+
You do not need to create the HBase _tmp_ directory; HBase will do this for you.
124111
+
125-
NOTE: The _hbase.rootdir_ in the above example points to a directory
126-
in the _local filesystem_. The 'file://' prefix is how we denote local
127-
filesystem. You should take the WARNING present in the configuration example
128-
to heart. In standalone mode HBase makes use of the local filesystem abstraction
129-
from the Apache Hadoop project. That abstraction doesn't provide the durability
130-
promises that HBase needs to operate safely. This is fine for local development
131-
and testing use cases where the cost of cluster failure is well contained. It is
132-
not appropriate for production deployments; eventually you will lose data.
133-
134-
To home HBase on an existing instance of HDFS, set the _hbase.rootdir_ to point at a
135-
directory up on your instance: e.g. _hdfs://namenode.example.org:8020/hbase_.
136-
For more on this variant, see the section below on Standalone HBase over HDFS.
112+
NOTE: When unconfigured, HBase uses <<hbase.tmp.dir,`hbase.tmp.dir`>> as a starting point for many
113+
important configurations. Notable among them are <<hbase.rootdir,`hbase.rootdir`>>, the path under
114+
which HBase stores its data. You can specify values for this configuration directly, as you'll see
115+
in the subsequent sections.
116+
+
117+
NOTE: In this example, HBase is running on Hadoop's `LocalFileSystem`. That abstraction doesn't
118+
provide the durability promises that HBase needs to operate safely. This is most likely acceptable
119+
for local development and testing use cases. It is not appropriate for production deployments;
120+
eventually you will lose data. Instead, ensure your production deployment sets
121+
<<hbase.rootdir,`hbase.rootdir`>> to a durable `FileSystem` implementation.
137122
138123
. The _bin/start-hbase.sh_ script is provided as a convenient way to start HBase.
139124
Issue the command, and if all goes well, a message is logged to standard output showing that HBase started successfully.
@@ -308,26 +293,21 @@ In the next sections we give a quick overview of other modes of hbase deploy.
308293
[[quickstart_pseudo]]
309294
=== Pseudo-Distributed Local Install
310295
311-
After working your way through <<quickstart,quickstart>> standalone mode,
312-
you can re-configure HBase to run in pseudo-distributed mode.
313-
Pseudo-distributed mode means that HBase still runs completely on a single host,
314-
but each HBase daemon (HMaster, HRegionServer, and ZooKeeper) runs as a separate process:
315-
in standalone mode all daemons ran in one jvm process/instance.
316-
By default, unless you configure the `hbase.rootdir` property as described in
317-
<<quickstart,quickstart>>, your data is still stored in _/tmp/_.
318-
In this walk-through, we store your data in HDFS instead, assuming you have HDFS available.
319-
You can skip the HDFS configuration to continue storing your data in the local filesystem.
296+
After working your way through the <<quickstart,quickstart>> using standalone mode, you can
297+
re-configure HBase to run in pseudo-distributed mode. Pseudo-distributed mode means that HBase
298+
still runs completely on a single host, but each HBase daemon (HMaster, HRegionServer, and
299+
ZooKeeper) runs as a separate process. Previously in <<quickstart,standalone mode>>, all these
300+
daemons ran in a single jvm process, and your data was stored under
301+
<<hbase.tmp.dir,`hbase.tmp.dir`>>. In this walk-through, your data will be stored in in HDFS
302+
instead, assuming you have HDFS available. This is optional; you can skip the HDFS configuration
303+
to continue storing your data in the local filesystem.
320304
321305
.Hadoop Configuration
322-
[NOTE]
323-
====
324-
This procedure assumes that you have configured Hadoop and HDFS on your local system and/or a remote
325-
system, and that they are running and available. It also assumes you are using Hadoop 2.
306+
NOTE: This procedure assumes that you have configured Hadoop and HDFS on your local system and/or a
307+
remote system, and that they are running and available. It also assumes you are using Hadoop 2.
326308
The guide on
327309
link:https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html[Setting up a Single Node Cluster]
328310
in the Hadoop documentation is a good starting point.
329-
====
330-
331311
332312
. Stop HBase if it is running.
333313
+
@@ -348,8 +328,8 @@ First, add the following property which directs HBase to run in distributed mode
348328
</property>
349329
----
350330
+
351-
Next, change the `hbase.rootdir` from the local filesystem to the address of your HDFS instance, using the `hdfs:////` URI syntax.
352-
In this example, HDFS is running on the localhost at port 8020. Be sure to either remove the entry for `hbase.unsafe.stream.capability.enforce` or set it to true.
331+
Next, add a configuration for `hbase.rootdir` so that it points to the address of your HDFS instance, using the `hdfs:////` URI syntax.
332+
In this example, HDFS is running on the localhost at port 8020.
353333
+
354334
[source,xml]
355335
----
@@ -360,10 +340,10 @@ In this example, HDFS is running on the localhost at port 8020. Be sure to eithe
360340
</property>
361341
----
362342
+
363-
You do not need to create the directory in HDFS.
364-
HBase will do this for you.
343+
You do not need to create the directory in HDFS; HBase will do this for you.
365344
If you create the directory, HBase will attempt to do a migration, which is not what you want.
366-
345+
+
346+
Finally, remove the configuration for `hbase.tmp.dir`.
367347
. Start HBase.
368348
+
369349
Use the _bin/start-hbase.sh_ command to start HBase.

0 commit comments

Comments
 (0)