stackabletech · fhennig · Apr 18, 2023 · Apr 19, 2023 · Apr 19, 2023 · Apr 19, 2023
diff --git a/docs/modules/hbase/images/hbase_overview.drawio.svg b/docs/modules/hbase/images/hbase_overview.drawio.svg
diff --git a/docs/modules/hbase/pages/getting_started/first_steps.adoc b/docs/modules/hbase/pages/getting_started/first_steps.adoc
@@ -200,4 +200,4 @@ This is because Phoenix requires these `SYSTEM.` tables for its own internal map
 
 == What's next
 
-Look at the xref:usage.adoc[Usage page] to find out more about configuring your HBase cluster.
+Look at the xref:usage-guide/index.adoc[] to find out more about configuring your HBase cluster.
diff --git a/docs/modules/hbase/pages/index.adoc b/docs/modules/hbase/pages/index.adoc
@@ -1,20 +1,41 @@
 = Stackable Operator for Apache HBase
+:description: The Stackable Operator for Apache HBase is a Kubernetes operator that can manage Apache HBase clusters. Learn about its features, resources, dependencies, and demos, and see the list of supported HBase versions.
+:keywords: Stackable Operator, Apache HBase, Kubernetes, operator, engineer, CRD, StatefulSet, ConfigMap, Service, ZooKeeper, HDFS
 
-This is an operator for Kubernetes that can manage https://hbase.apache.org/[Apache HBase]
-clusters.
+This is an Operator for Kubernetes that manages https://hbase.apache.org/[Apache HBase] clusters.
+Apache HBase is an open-source, distributed, non-relational database that runs on top of the Hadoop Distributed File System (HDFS).
 
-WARNING: This operator is part of the Stackable Data Platform and only works with images from the
-https://repo.stackable.tech/#browse/browse:docker:v2%2Fstackable%2Fhbase[Stackable] repository.
+== Getting started
+
+Follow the xref:getting_started/index.adoc[] guide to learn how to xref:getting_started/installation.adoc[install] the Stackable Operator for Apache HBase as well as the dependencies. The Guide will also show you how to xref:getting_started/first_steps.adoc[interact] with HBase running on Kubernetes by creating tables and some data using the REST API or Apache Phoenix. 
+
+The xref:usage-guide/index.adoc[] contains more information on xref:usage-guide/phoenix.adoc[] as well as other topics such as xref:usage-guide/resource-requests.adoc[CPU and memory configuration], xref:usage-guide/monitoring.adoc[] and xref:usage-guide/logging.adoc[].
+
+== Operator model
+
+The Operator manages the _HbaseCluster_ custom resource. You configure your HBase instance using this resource, and the Operator creates Kubernetes resources such as StatefulSets, ConfigMaps and Services accordingly.
+
+HBase uses three xref:concepts:roles-and-role-groups.adoc[roles]: `masters`, `regionServers` and `restServers`.
+
+image::hbase_overview.drawio.svg[A diagram depicting the Kubernetes resources created by the operator]
+
+For every RoleGroup a **StatefulSet** is created. Each StatefulSet can contain multiple replicas (Pods).
+For every RoleGroup a **Service** is created, as well as one for the whole cluster that references the `regionServers`.
+For every Role and RoleGroup the Operator creates a **Service**.
+
+A **ConfigMap** is created for each RoleGroup containing 3 files: `hbase-env.sh` and `hbase-site.xml` files generated from the HbaseCluster configuration (See xref:usage-guide/index.adoc[] for more information), plus a `log4j.properties` file used for xref:usage-guide/logging.adoc[].
+The Operator creates a **xref:usage-guide/discovery.adoc[discovery ConfigMap]** for the whole HbaseCluster a which contains information on how to connect to the HBase cluster.
+
+== Dependencies
+
+A distributed Apache HBase installation depends on a running Apache ZooKeeper and HDFS cluster. See the documentation for the xref:hdfs:index.adoc[Stackable Operator for Apache HDFS] how to set up these clusters.
+
+== Demo
+
+The xref:stackablectl::demos/hbase-hdfs-load-cycling-data.adoc[] demo shows how you can use HBase together with HDFS.
 
 == Supported Versions
 
 The Stackable Operator for Apache HBase currently supports the following versions of Apache HBase:
 
 include::partial$supported-versions.adoc[]
-
-== Getting the Docker image
-
-[source]
-----
-docker pull docker.stackable.tech/stackable/hbase:<version>
-----
diff --git a/...dules/hbase/pages/cluster_operations.adoc → ...pages/usage-guide/cluster-operations.adoc b/...dules/hbase/pages/cluster_operations.adoc → ...pages/usage-guide/cluster-operations.adoc
@@ -1,4 +1,4 @@
+= Cluster operation
+:page-aliases: cluster_operations.adoc
 
-= Cluster Operation
-
-HBase installations can be configured with different cluster operations like pausing reconciliation or stopping the cluster. See xref:concepts:cluster_operations.adoc[cluster operations] for more details.
+HBase installations can be configured with different cluster operations like pausing reconciliation or stopping the cluster. See xref:concepts:cluster_operations.adoc[cluster operations] for more details.
diff --git a/docs/modules/hbase/pages/discovery.adoc → ...es/hbase/pages/usage-guide/discovery.adoc b/docs/modules/hbase/pages/discovery.adoc → ...es/hbase/pages/usage-guide/discovery.adoc
@@ -2,10 +2,11 @@
 :namespace: \{namespace\}
 :hdfs-cluster-name: \{hdfs-cluster-name\}
 :zookeeper-znode-name: \{zookeeper-znode-name\}
+:page-aliases: discovery.adoc
 
 = Discovery
 
-The Stackable Operator for Apache HBase publishes a discovery https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.23/#configmap-v1-core[`ConfigMap`], which exposes a client configuration bundle that allows access to the Apache HBase cluster.
+The Stackable Operator for Apache HBase publishes a discovery https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.23/#configmap-v1-core[ConfigMap], which exposes a client configuration bundle that allows access to the Apache HBase cluster.
 
 == Example
 
@@ -23,16 +24,16 @@ spec:
     hdfsConfigMapName: {hdfs-cluster-name} #<3>
     zookeeperConfigMapName: {zookeeper-znode-name} #<4>
 ----
-<1> The name of the HBase cluster, which is also the name of the created discovery `ConfigMap`.
-<2> The namespace of the discovery `ConfigMap`.
-<3> The `ConfigMap` name to discover the HDFS cluster.
-<4> The `ConfigMap` name to discover the ZooKeeper cluster.
+<1> The name of the HBase cluster, which is also the name of the created discovery ConfigMap.
+<2> The namespace of the discovery ConfigMap.
+<3> The ConfigMap name to discover the HDFS cluster.
+<4> The ConfigMap name to discover the ZooKeeper cluster.
 
-The resulting discovery `ConfigMap` is located at `{namespace}/{cluster-name}`.
+The resulting discovery ConfigMap is located at `{namespace}/{cluster-name}`.
 
 == Contents
 
-The `ConfigMap` data values are formatted as Hadoop XML files which allows simple mounting of that `ConfigMap` into pods that require access to HBase.
+The ConfigMap data values are formatted as Hadoop XML files which allows simple mounting of that ConfigMap into pods that require access to HBase.
 
 `hbase-site.xml`::
 Contains the `hbase.zookeeper.quorum` property.
diff --git a/docs/modules/hbase/pages/usage-guide/index.adoc b/docs/modules/hbase/pages/usage-guide/index.adoc
@@ -0,0 +1,9 @@
+= Usage guide
+
+Learn about xref:usage-guide/cluster-operations.adoc[starting, stopping and pausing] your cluster.
+
+Learn about xref:usage-guide/pod-placement.adoc[configuring where Pods are scheduled] and xref:usage-guide/resource-requests.adoc[how many CPU and memory resources] your Pods consume.
+
+You can observe what's happening with your HBase using xref:usage-guide/logging.adoc[logging] and xref:usage-guide/monitoring.adoc[monitoring].
+
+Connect to HBase using xref:usage-guide/phoenix.adoc[Apache Phoenix] or use the xref:usage-guide/discovery.adoc[discovery ConfigMap] to connect other products.
diff --git a/docs/modules/hbase/pages/usage-guide/logging.adoc b/docs/modules/hbase/pages/usage-guide/logging.adoc
@@ -0,0 +1,26 @@
+= Log aggregation
+
+The logs can be forwarded to a Vector log aggregator by providing a discovery
+ConfigMap for the aggregator and by enabling the log agent:
+
+[source,yaml]
+----
+spec:
+  clusterConfig:
+    vectorAggregatorConfigMapName: vector-aggregator-discovery
+  masters:
+    config:
+      logging:
+        enableVectorAgent: true
+  regionServers:
+    config:
+      logging:
+        enableVectorAgent: true
+  restServers:
+    config:
+      logging:
+        enableVectorAgent: true
+----
+
+Further information on how to configure logging, can be found in
+xref:home:concepts:logging.adoc[].
diff --git a/docs/modules/hbase/pages/usage-guide/monitoring.adoc b/docs/modules/hbase/pages/usage-guide/monitoring.adoc
@@ -0,0 +1,4 @@
+= Monitoring
+
+The managed HBase instances are automatically configured to export Prometheus metrics. See
+xref:home:operators:monitoring.adoc[] for more details.
diff --git a/docs/modules/hbase/pages/usage-guide/overrides.adoc b/docs/modules/hbase/pages/usage-guide/overrides.adoc
@@ -0,0 +1,56 @@
+
+= Configuration overrides
+
+The cluster definition also supports overriding configuration properties and environment variables, either per role or per role group, where the more specific override (role group) has precedence over the less specific one (role).
+
+IMPORTANT: Overriding certain properties which are set by operator can interfere with the operator and can lead to problems.
+
+== Configuration Properties
+
+For a role or role group, at the same level of `config`, you can specify: `configOverrides` for the following files:
+
+- `hbase-site.xml`
+- `hbase-env.sh`
+
+NOTE: `hdfs-site.xml` is not listed here, the file is always taken from the referenced hdfs cluster. If you want to modify it, have a look at xref:hdfs:usage-guide/configuration-environment-overrides.adoc[HDFS configuration overrides].
+
+For example, if you want to set the `hbase.rest.threads.min` to 4 and the `HBASE_HEAPSIZE` to two GB adapt the `restServers` section of the cluster resource like so:
+
+[source,yaml]
+----
+restServers:
+  roleGroups:
+    default:
+      config: {}
+      configOverrides:
+        hbase-site.xml:
+          hbase.rest.threads.min: "4"
+        hbase-env.sh:
+          HBASE_HEAPSIZE: "2G"
+      replicas: 1
+----
+
+Just as for the `config`, it is possible to specify this at role level as well:
+
+[source,yaml]
+----
+restServers:
+  configOverrides:
+    hbase-site.xml:
+      hbase.rest.threads.min: "4"
+    hbase-env.sh:
+      HBASE_HEAPSIZE: "2G"
+  roleGroups:
+    default:
+      config: {}
+      replicas: 1
+----
+
+All override property values must be strings. The properties will be formatted and escaped correctly into the XML file, respectively inserted as is into the `env.sh` file.
+
+For a full list of configuration options we refer to the HBase https://hbase.apache.org/book.html#config.files[Configuration Documentation].
+
+// Environment configuration is not implemented. The environment is managed
+// with the hbase-env.sh configuration file
+
+// CLI overrides are also not implemented
diff --git a/docs/modules/hbase/pages/usage-guide/phoenix.adoc b/docs/modules/hbase/pages/usage-guide/phoenix.adoc
@@ -0,0 +1,33 @@
+= Using Apache Phoenix
+
+The Apache Phoenix project provides the ability to interact with HBase with JBDC using familiar SQL-syntax. The Phoenix dependencies are bundled with the Stackable HBase image and do not need to be installed separately (client components will need to ensure that they have the correct client-side libraries available). Information about client-side installation can be found https://phoenix.apache.org/installation.html[here].
+
+Phoenix comes bundled with a few simple scripts to verify a correct server-side installation. For example, assuming that phoenix dependencies have been installed to their default location of `/stackable/phoenix/bin`, we can issue the following using the supplied `psql.py` script:
+
+[source,shell script]
+----
+/stackable/phoenix/bin/psql.py  && \
+   /stackable/phoenix/examples/WEB_STAT.sql && \
+   /stackable/phoenix/examples/WEB_STAT.csv  && \
+   /stackable/phoenix/examples/WEB_STAT_QUERIES.sql
+----
+
+This script creates a java command that creates, populates and queries a Phoenix table called `WEB_STAT`. Alternatively, one can use the `sqlline.py` script (which wraps the https://github.com/julianhyde/sqlline[sqlline] utility):
+
+[source,shell script]
+----
+/stackable/phoenix/bin/sqlline.py [zookeeper] [sql file]
+----
+
+The script opens an SQL prompt from where one can list, query, create and generally interact with Phoenix tables. So, to query the table that was created in the previous step, start the script and enter some SQL at the prompt:
+
+image::phoenix_sqlline.png[Phoenix Sqlline]
+
+The Phoenix table `WEB_STAT` is created as an HBase table, and can be viewed normally from within the HBase UI:
+
+image::phoenix_tables.png[Phoenix Tables]
+
+The `SYSTEM`* tables are those required by Phoenix and are created the first time that Phoenix is invoked.
+
+NOTE: Both `psql.py` and `sqlline.py` generate a java command that calls classes from the Phoenix client library `.jar`. The Zookeeper quorum does not need to be supplied as part of the URL used by the JDBC connection string, as long as the environment variable `HBASE_CONF_DIR` is set and supplied as an element for the `-cp` classpath search: the cluster information is then extracted from `$HBASE_CONF_DIR/hbase-site.xml`.
+
diff --git a/docs/modules/hbase/pages/pod_placement.adoc → ...base/pages/usage-guide/pod-placement.adoc b/docs/modules/hbase/pages/pod_placement.adoc → ...base/pages/usage-guide/pod-placement.adoc
@@ -1,4 +1,5 @@
-= Pod Placement
+= Pod placement
+:page-aliases: pod_placement.adoc
 
 You can configure Pod placement for HDFS nodes as described in xref:concepts:pod_placement.adoc[].
 
@@ -92,5 +93,5 @@ affinity:
 
 In the examples above `cluster-name` is the name of the HBase custom resource that owns this Pod. The `hdfs-cluster-name` is the name of the HDFS cluster that was configured in the `hdfsConfigMapName` property.
 
-NOTE: It is important that the `hdfsConfigMapName` property contains the name the HDFS cluster. You could instead configure `ConfigMap`s of specific name or data roles, but for the purpose of pod placement, this will lead to faulty behavior.
+NOTE: It is important that the `hdfsConfigMapName` property contains the name the HDFS cluster. You could instead configure ConfigMaps of specific name or data roles, but for the purpose of pod placement, this will lead to faulty behavior.
 
diff --git a/docs/modules/hbase/pages/usage-guide/resource-requests.adoc b/docs/modules/hbase/pages/usage-guide/resource-requests.adoc
@@ -0,0 +1,23 @@
+= Resource requests
+
+include::home:concepts:stackable_resource_requests.adoc[]
+
+If no resources are configured explicitly, the HBase operator uses following defaults:
+
+[source,yaml]
+----
+regionServers:
+  roleGroups:
+    default:
+      config:
+        resources:
+          cpu:
+            min: '200m'
+            max: "4"
+          memory:
+            limit: '2Gi'
+----
+
+WARNING: The default values are _most likely_ not sufficient to run a proper cluster in production. Please adapt according to your requirements.
+
+For more details regarding Kubernetes CPU limits see: https://kubernetes.io/docs/tasks/configure-pod-container/assign-cpu-resource/[Assign CPU Resources to Containers and Pods].
Original file line number	Diff line number	Diff line change
Expand Up		@@ -200,4 +200,4 @@ This is because Phoenix requires these `SYSTEM.` tables for its own internal map

		== What's next

		Look at the xref:usage.adoc[Usage page] to find out more about configuring your HBase cluster.
		Look at the xref:usage-guide/index.adoc[] to find out more about configuring your HBase cluster.