Skip to content

[Merged by Bors] - docs: split up usage page and improve landing page #344

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions docs/modules/hbase/images/hbase_overview.drawio.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion docs/modules/hbase/pages/getting_started/first_steps.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -200,4 +200,4 @@ This is because Phoenix requires these `SYSTEM.` tables for its own internal map

== What's next

Look at the xref:usage.adoc[Usage page] to find out more about configuring your HBase cluster.
Look at the xref:usage-guide/index.adoc[] to find out more about configuring your HBase cluster.
43 changes: 32 additions & 11 deletions docs/modules/hbase/pages/index.adoc
Original file line number Diff line number Diff line change
@@ -1,20 +1,41 @@
= Stackable Operator for Apache HBase
:description: The Stackable Operator for Apache HBase is a Kubernetes operator that can manage Apache HBase clusters. Learn about its features, resources, dependencies, and demos, and see the list of supported HBase versions.
:keywords: Stackable Operator, Apache HBase, Kubernetes, operator, engineer, CRD, StatefulSet, ConfigMap, Service, ZooKeeper, HDFS

This is an operator for Kubernetes that can manage https://hbase.apache.org/[Apache HBase]
clusters.
This is an Operator for Kubernetes that manages https://hbase.apache.org/[Apache HBase] clusters.
Apache HBase is an open-source, distributed, non-relational database that runs on top of the Hadoop Distributed File System (HDFS).

WARNING: This operator is part of the Stackable Data Platform and only works with images from the
https://repo.stackable.tech/#browse/browse:docker:v2%2Fstackable%2Fhbase[Stackable] repository.
== Getting started

Follow the xref:getting_started/index.adoc[] guide to learn how to xref:getting_started/installation.adoc[install] the Stackable Operator for Apache HBase as well as the dependencies. The Guide will also show you how to xref:getting_started/first_steps.adoc[interact] with HBase running on Kubernetes by creating tables and some data using the REST API or Apache Phoenix.

The xref:usage-guide/index.adoc[] contains more information on xref:usage-guide/phoenix.adoc[] as well as other topics such as xref:usage-guide/resource-requests.adoc[CPU and memory configuration], xref:usage-guide/monitoring.adoc[] and xref:usage-guide/logging.adoc[].

== Operator model

The Operator manages the _HbaseCluster_ custom resource. You configure your HBase instance using this resource, and the Operator creates Kubernetes resources such as StatefulSets, ConfigMaps and Services accordingly.

HBase uses three xref:concepts:roles-and-role-groups.adoc[roles]: `masters`, `regionServers` and `restServers`.

image::hbase_overview.drawio.svg[A diagram depicting the Kubernetes resources created by the operator]

For every RoleGroup a **StatefulSet** is created. Each StatefulSet can contain multiple replicas (Pods).
For every RoleGroup a **Service** is created, as well as one for the whole cluster that references the `regionServers`.
For every Role and RoleGroup the Operator creates a **Service**.

A **ConfigMap** is created for each RoleGroup containing 3 files: `hbase-env.sh` and `hbase-site.xml` files generated from the HbaseCluster configuration (See xref:usage-guide/index.adoc[] for more information), plus a `log4j.properties` file used for xref:usage-guide/logging.adoc[].
The Operator creates a **xref:usage-guide/discovery.adoc[discovery ConfigMap]** for the whole HbaseCluster a which contains information on how to connect to the HBase cluster.

== Dependencies

A distributed Apache HBase installation depends on a running Apache ZooKeeper and HDFS cluster. See the documentation for the xref:hdfs:index.adoc[Stackable Operator for Apache HDFS] how to set up these clusters.

== Demo

The xref:stackablectl::demos/hbase-hdfs-load-cycling-data.adoc[] demo shows how you can use HBase together with HDFS.

== Supported Versions

The Stackable Operator for Apache HBase currently supports the following versions of Apache HBase:

include::partial$supported-versions.adoc[]

== Getting the Docker image
[source]
----
docker pull docker.stackable.tech/stackable/hbase:<version>
----
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
= Cluster operation
:page-aliases: cluster_operations.adoc

= Cluster Operation

HBase installations can be configured with different cluster operations like pausing reconciliation or stopping the cluster. See xref:concepts:cluster_operations.adoc[cluster operations] for more details.
HBase installations can be configured with different cluster operations like pausing reconciliation or stopping the cluster. See xref:concepts:cluster_operations.adoc[cluster operations] for more details.
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,11 @@
:namespace: \{namespace\}
:hdfs-cluster-name: \{hdfs-cluster-name\}
:zookeeper-znode-name: \{zookeeper-znode-name\}
:page-aliases: discovery.adoc

= Discovery

The Stackable Operator for Apache HBase publishes a discovery https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.23/#configmap-v1-core[`ConfigMap`], which exposes a client configuration bundle that allows access to the Apache HBase cluster.
The Stackable Operator for Apache HBase publishes a discovery https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.23/#configmap-v1-core[ConfigMap], which exposes a client configuration bundle that allows access to the Apache HBase cluster.

== Example

Expand All @@ -23,16 +24,16 @@ spec:
hdfsConfigMapName: {hdfs-cluster-name} #<3>
zookeeperConfigMapName: {zookeeper-znode-name} #<4>
----
<1> The name of the HBase cluster, which is also the name of the created discovery `ConfigMap`.
<2> The namespace of the discovery `ConfigMap`.
<3> The `ConfigMap` name to discover the HDFS cluster.
<4> The `ConfigMap` name to discover the ZooKeeper cluster.
<1> The name of the HBase cluster, which is also the name of the created discovery ConfigMap.
<2> The namespace of the discovery ConfigMap.
<3> The ConfigMap name to discover the HDFS cluster.
<4> The ConfigMap name to discover the ZooKeeper cluster.

The resulting discovery `ConfigMap` is located at `{namespace}/{cluster-name}`.
The resulting discovery ConfigMap is located at `{namespace}/{cluster-name}`.

== Contents

The `ConfigMap` data values are formatted as Hadoop XML files which allows simple mounting of that `ConfigMap` into pods that require access to HBase.
The ConfigMap data values are formatted as Hadoop XML files which allows simple mounting of that ConfigMap into pods that require access to HBase.

`hbase-site.xml`::
Contains the `hbase.zookeeper.quorum` property.
9 changes: 9 additions & 0 deletions docs/modules/hbase/pages/usage-guide/index.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
= Usage guide

Learn about xref:usage-guide/cluster-operations.adoc[starting, stopping and pausing] your cluster.

Learn about xref:usage-guide/pod-placement.adoc[configuring where Pods are scheduled] and xref:usage-guide/resource-requests.adoc[how many CPU and memory resources] your Pods consume.

You can observe what's happening with your HBase using xref:usage-guide/logging.adoc[logging] and xref:usage-guide/monitoring.adoc[monitoring].

Connect to HBase using xref:usage-guide/phoenix.adoc[Apache Phoenix] or use the xref:usage-guide/discovery.adoc[discovery ConfigMap] to connect other products.
26 changes: 26 additions & 0 deletions docs/modules/hbase/pages/usage-guide/logging.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
= Log aggregation

The logs can be forwarded to a Vector log aggregator by providing a discovery
ConfigMap for the aggregator and by enabling the log agent:

[source,yaml]
----
spec:
clusterConfig:
vectorAggregatorConfigMapName: vector-aggregator-discovery
masters:
config:
logging:
enableVectorAgent: true
regionServers:
config:
logging:
enableVectorAgent: true
restServers:
config:
logging:
enableVectorAgent: true
----

Further information on how to configure logging, can be found in
xref:home:concepts:logging.adoc[].
4 changes: 4 additions & 0 deletions docs/modules/hbase/pages/usage-guide/monitoring.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
= Monitoring

The managed HBase instances are automatically configured to export Prometheus metrics. See
xref:home:operators:monitoring.adoc[] for more details.
56 changes: 56 additions & 0 deletions docs/modules/hbase/pages/usage-guide/overrides.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@

= Configuration overrides

The cluster definition also supports overriding configuration properties and environment variables, either per role or per role group, where the more specific override (role group) has precedence over the less specific one (role).

IMPORTANT: Overriding certain properties which are set by operator can interfere with the operator and can lead to problems.

== Configuration Properties

For a role or role group, at the same level of `config`, you can specify: `configOverrides` for the following files:

- `hbase-site.xml`
- `hbase-env.sh`

NOTE: `hdfs-site.xml` is not listed here, the file is always taken from the referenced hdfs cluster. If you want to modify it, have a look at xref:hdfs:usage-guide/configuration-environment-overrides.adoc[HDFS configuration overrides].

For example, if you want to set the `hbase.rest.threads.min` to 4 and the `HBASE_HEAPSIZE` to two GB adapt the `restServers` section of the cluster resource like so:

[source,yaml]
----
restServers:
roleGroups:
default:
config: {}
configOverrides:
hbase-site.xml:
hbase.rest.threads.min: "4"
hbase-env.sh:
HBASE_HEAPSIZE: "2G"
replicas: 1
----

Just as for the `config`, it is possible to specify this at role level as well:

[source,yaml]
----
restServers:
configOverrides:
hbase-site.xml:
hbase.rest.threads.min: "4"
hbase-env.sh:
HBASE_HEAPSIZE: "2G"
roleGroups:
default:
config: {}
replicas: 1
----

All override property values must be strings. The properties will be formatted and escaped correctly into the XML file, respectively inserted as is into the `env.sh` file.

For a full list of configuration options we refer to the HBase https://hbase.apache.org/book.html#config.files[Configuration Documentation].

// Environment configuration is not implemented. The environment is managed
// with the hbase-env.sh configuration file

// CLI overrides are also not implemented
33 changes: 33 additions & 0 deletions docs/modules/hbase/pages/usage-guide/phoenix.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
= Using Apache Phoenix

The Apache Phoenix project provides the ability to interact with HBase with JBDC using familiar SQL-syntax. The Phoenix dependencies are bundled with the Stackable HBase image and do not need to be installed separately (client components will need to ensure that they have the correct client-side libraries available). Information about client-side installation can be found https://phoenix.apache.org/installation.html[here].

Phoenix comes bundled with a few simple scripts to verify a correct server-side installation. For example, assuming that phoenix dependencies have been installed to their default location of `/stackable/phoenix/bin`, we can issue the following using the supplied `psql.py` script:

[source,shell script]
----
/stackable/phoenix/bin/psql.py && \
/stackable/phoenix/examples/WEB_STAT.sql && \
/stackable/phoenix/examples/WEB_STAT.csv && \
/stackable/phoenix/examples/WEB_STAT_QUERIES.sql
----

This script creates a java command that creates, populates and queries a Phoenix table called `WEB_STAT`. Alternatively, one can use the `sqlline.py` script (which wraps the https://github.com/julianhyde/sqlline[sqlline] utility):

[source,shell script]
----
/stackable/phoenix/bin/sqlline.py [zookeeper] [sql file]
----

The script opens an SQL prompt from where one can list, query, create and generally interact with Phoenix tables. So, to query the table that was created in the previous step, start the script and enter some SQL at the prompt:

image::phoenix_sqlline.png[Phoenix Sqlline]

The Phoenix table `WEB_STAT` is created as an HBase table, and can be viewed normally from within the HBase UI:

image::phoenix_tables.png[Phoenix Tables]

The `SYSTEM`* tables are those required by Phoenix and are created the first time that Phoenix is invoked.

NOTE: Both `psql.py` and `sqlline.py` generate a java command that calls classes from the Phoenix client library `.jar`. The Zookeeper quorum does not need to be supplied as part of the URL used by the JDBC connection string, as long as the environment variable `HBASE_CONF_DIR` is set and supplied as an element for the `-cp` classpath search: the cluster information is then extracted from `$HBASE_CONF_DIR/hbase-site.xml`.

Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
= Pod Placement
= Pod placement
:page-aliases: pod_placement.adoc

You can configure Pod placement for HDFS nodes as described in xref:concepts:pod_placement.adoc[].

Expand Down Expand Up @@ -92,5 +93,5 @@ affinity:

In the examples above `cluster-name` is the name of the HBase custom resource that owns this Pod. The `hdfs-cluster-name` is the name of the HDFS cluster that was configured in the `hdfsConfigMapName` property.

NOTE: It is important that the `hdfsConfigMapName` property contains the name the HDFS cluster. You could instead configure `ConfigMap`s of specific name or data roles, but for the purpose of pod placement, this will lead to faulty behavior.
NOTE: It is important that the `hdfsConfigMapName` property contains the name the HDFS cluster. You could instead configure ConfigMaps of specific name or data roles, but for the purpose of pod placement, this will lead to faulty behavior.

23 changes: 23 additions & 0 deletions docs/modules/hbase/pages/usage-guide/resource-requests.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
= Resource requests

include::home:concepts:stackable_resource_requests.adoc[]

If no resources are configured explicitly, the HBase operator uses following defaults:

[source,yaml]
----
regionServers:
roleGroups:
default:
config:
resources:
cpu:
min: '200m'
max: "4"
memory:
limit: '2Gi'
----

WARNING: The default values are _most likely_ not sufficient to run a proper cluster in production. Please adapt according to your requirements.

For more details regarding Kubernetes CPU limits see: https://kubernetes.io/docs/tasks/configure-pod-container/assign-cpu-resource/[Assign CPU Resources to Containers and Pods].
Loading