stackabletech · fhennig · Dec 14, 2022 · Mar 21, 2023 · Mar 21, 2023 · Mar 21, 2023
diff --git a/docs/modules/druid/images/druid_overview.drawio.svg b/docs/modules/druid/images/druid_overview.drawio.svg
diff --git a/docs/modules/druid/pages/getting_started/first_steps.adoc b/docs/modules/druid/pages/getting_started/first_steps.adoc
@@ -164,4 +164,4 @@ Great! You've set up your first Druid cluster, ingested some data and queried it
 
 == What's next
 
-Have a look at the xref:usage.adoc[] page to find out more about the features of the Operator, such as S3 backed deep storage or OPA based authorization.
+Have a look at the xref:usage-guide/index.adoc[] page to find out more about the features of the Operator, such as S3-backed deep storage or OPA-based authorization.
diff --git a/docs/modules/druid/pages/index.adoc b/docs/modules/druid/pages/index.adoc
@@ -1,6 +1,49 @@
 = Stackable Operator for Apache Druid
+:description: The Stackable Operator for Apache Druid is a Kubernetes operator that can manage Apache Druid clusters. Learn about its features, resources, dependencies, and demos, and see the list of supported Druid versions.
+:keywords: Stackable Operator, Apache Druid, Kubernetes, operator, DevOps, engineer, CRD, StatefulSet, ConfigMap, Service, ZooKeeper, HDFS, S3, Kafka, Trino, OPA, demo, version
+
+The Stackable Operator for Apache Druid is an operator that can deploy and manage https://druid.apache.org/[Apache Druid] clusters on Kubernetes. This operator provides several resources and features to manage Druid clusters efficiently.
+
+== Getting Started
+
+To get started with the Stackable Operator for Apache Druid, follow the xref:druid:getting_started/index.adoc[Getting Started guide]. The Operator is installed along with the _DruidCluster_ CustomResourceDefinition, which supports five xref:home:concepts:roles-and-role-groups.adoc[roles]: **Router**, **Coordinator**, **Broker**, **MiddleManager** and **Historical**. These roles correspond to https://druid.apache.org/docs/latest/design/processes.html[Druid processes].
+
+== Resources
+
+The Operator watches DruidCluster objects and creates multiple Kubernetes resources for each DruidCluster based on its configuration.
+
+image::druid_overview.drawio.svg[A diagram depicting the Kubernetes resources created by the operator]
+
+For every RoleGroup a **StatefulSet** is created. Each StatefulSet can contain multiple replicas (Pods). Each Pod has at least two containers: the main Druid container and a preparation container which just runs once at startup. If xref:usage-guide/logging.adoc[] is enabled, there is a sidecar container for logging too.
+For every Role and RoleGroup the Operator creates a **Service**.
+
+A **ConfigMap** is created for each RoleGroup containing 3 files: `jvm.config` and `runtime.properties` files generated from the DruidCluster configuration (See xref:usage-guide/index.adoc[] for more information), plus a `log4j2.properties` file used for xref:usage-guide/logging.adoc[].
+For the whole DruidCluster a **xref:discovery.adoc[discovery ConfigMap]** is created which contains information on how to connect to the Druid cluster.
+
+== Dependencies and other Operators to connect to
+
+The Druid Operator has the following dependencies:
+
+* A xref:usage-guide/deep-storage.adoc[deep storage] backend is required to persist data. Use either xref:usage-guide/deep-storage.adoc#hdfs[HDFS] with the xref:hdfs:index.adoc[] or xref:usage-guide/deep-storage.adoc#s3[S3].
+* An SQL database to store metadata.
+* Apache ZooKeeper via the xref:zookeeper:index.adoc[]. Apache ZooKeeper is used by Druid for internal communication between processes.
+* The xref:commons-operator:index.adoc[] provides common CRDs such as xref:concepts:s3.adoc[] CRDs.
+* The xref:secret-operator:index.adoc[] is required for things like S3 access credentials or LDAP integration.
+
+Have a look at the xref:getting_started/index.adoc[getting started guide] for an example of a minimal working setup. Druid works well with other Stackable supported products, such as xref:kafka:index.adoc[Apache Kafka] for data ingestion xref:trino:index.adoc[Trino] for data processing or xref:superset:index.adoc[Superset] for data visualization. xref:opa:index.adoc[OPA] can be connected to create authorization policies. Have a look at the xref:usage-guide/index.adoc[] for more configuration options and have a look at the <<demos, demos>> for complete data pipelines you can install with a single command.
+
+== [[demos]]Demos
+
+xref:stackablectl::index.adoc[] supports installing xref:stackablectl::demos/index.adoc[] with a single command. The demos are complete data piplines which showcase multiple components of the Stackable platform working together and which you can try out interactively. Both demos below include Druid as part of the data pipeline:
+
+=== Waterlevel Demo
+
+The xref:stackablectl::demos/nifi-kafka-druid-water-level-data.adoc[] demo uses data from https://www.pegelonline.wsv.de/webservice/ueberblick[PEGELONLINE] to visualize water levels in rivers and coastal regions of Germany from historic and real time data.
+
+=== Earthquake Demo
+
+The xref:stackablectl::demos/nifi-kafka-druid-earthquake-data.adoc[] demo ingests https://earthquake.usgs.gov/[earthquake data] into a similar pipeline as is used in the waterlevel demo.
 
-This is an operator for Kubernetes that can manage https://druid.apache.org/[Apache Druid] clusters.
 
 == Supported Versions
 

diff --git a/docs/modules/druid/pages/usage-guide/configuration-and-environment-overrides.adoc b/docs/modules/druid/pages/usage-guide/configuration-and-environment-overrides.adoc
@@ -0,0 +1,69 @@
+= Configuration & Environment Overrides
+
+The cluster definition also supports overriding configuration properties and environment variables, either per role or per role group, where the more specific override (role group) has precedence over the less specific one (role).
+
+IMPORTANT: Overriding certain properties which are set by the operator (such as the HTTP port) can interfere with the operator and can lead to problems.
+
+== Configuration Properties
+
+For a role or role group, at the same level of `config`, you can specify: `configOverrides` for the `runtime.properties`. For example, if you want to set the `druid.server.http.numThreads` for the router to 100 adapt the `routers` section of the cluster resource like so:
+
+[source,yaml]
+----
+routers:
+  roleGroups:
+    default:
+      config: {}
+      configOverrides:
+        runtime.properties:
+          druid.server.http.numThreads: "100"
+      replicas: 1
+----
+
+Just as for the `config`, it is possible to specify this at role level as well:
+
+[source,yaml]
+----
+routers:
+  configOverrides:
+    runtime.properties:
+      druid.server.http.numThreads: "100"
+  roleGroups:
+    default:
+      config: {}
+      replicas: 1
+----
+
+All override property values must be strings.
+
+For a full list of configuration options please refer to the Druid https://druid.apache.org/docs/latest/configuration/index.html[Configuration Reference].
+
+== Environment Variables
+
+In a similar fashion, environment variables can be (over)written. For example per role group:
+
+[source,yaml]
+----
+routers:
+  roleGroups:
+    default:
+      config: {}
+      envOverrides:
+        MY_ENV_VAR: "MY_VALUE"
+      replicas: 1
+----
+
+or per role:
+
+[source,yaml]
+----
+routers:
+  envOverrides:
+    MY_ENV_VAR: "MY_VALUE"
+  roleGroups:
+    default:
+      config: {}
+      replicas: 1
+----
+
+// cliOverrides don't make sense for this operator, so the feature is omitted for now
diff --git a/docs/modules/druid/pages/usage-guide/deep-storage.adoc b/docs/modules/druid/pages/usage-guide/deep-storage.adoc
@@ -0,0 +1,83 @@
+= Deep storage configuration
+
+== [[hdfs]]HDFS
+
+Druid can use HDFS as a backend for deep storage:
+
+[source,yaml]
+----
+spec:
+  clusterConfig:
+    deepStorage:
+      hdfs:
+        configMapName: simple-hdfs # <1>
+        directory: /druid # <2>
+...
+----
+<1> Name of the HDFS cluster discovery config map. Can be supplied manually for a cluster not provided by Stackable. Needs to contain the `core-site.xml` and `hdfs-site.xml`.
+<2> The directory where to store the druid data.
+
+== [[s3]]S3
+
+Druid can use S3 as a backend for deep storage:
+
+[source,yaml]
+----
+spec:
+  clusterConfig:
+    deepStorage:
+      s3:
+        bucket:
+          inline:
+            bucketName: my-bucket  # <1>
+            connection:
+              inline:
+                host: test-minio  # <2>
+                port: 9000  # <3>
+                credentials:  # <4>
+                ...
+----
+<1> Bucket name.
+<2> Bucket host.
+<3> Optional bucket port.
+<4> Credentials explained <<S3 Credentials, below>>.
+
+It is also possible to configure the bucket connection details as a separate Kubernetes resource and only refer to that object from the DruidCluster like this:
+
+[source,yaml]
+----
+spec:
+  clusterConfig:
+    deepStorage:
+      s3:
+        bucket:
+          reference: my-bucket-resource # <1>
+----
+<1> Name of the bucket resource with connection details.
+
+The resource named `my-bucket-resource` is then defined as shown below:
+
+[source,yaml]
+----
+---
+apiVersion: s3.stackable.tech/v1alpha1
+kind: S3Bucket
+metadata:
+  name: my-bucket-resource
+spec:
+  bucketName: my-bucket-name
+  connection:
+    inline:
+      host: test-minio
+      port: 9000
+      credentials:
+        ... (explained below)
+----
+
+This has the advantage that bucket configuration can be shared across DruidClusters (and other stackable CRDs) and reduces the cost of updating these details.
+
+include::partial$s3-note.adoc[]
+
+=== S3 Credentials
+
+include::partial$s3-credentials.adoc[]
diff --git a/docs/modules/druid/pages/discovery.adoc → ...es/druid/pages/usage-guide/discovery.adoc b/docs/modules/druid/pages/discovery.adoc → ...es/druid/pages/usage-guide/discovery.adoc
@@ -3,8 +3,9 @@
 :routerPort: 8888
 
 = Discovery
+:page-aliases: discovery.adoc
 
-The Stackable Operator for Apache Druid publishes a discovery https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.23/#configmap-v1-core[`ConfigMap`], which exposes a client configuration bundle that allows access to the Apache Druid cluster.
+The Stackable Operator for Apache Druid publishes a discovery https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.23/#configmap-v1-core[ConfigMap], which exposes a client configuration bundle that allows access to the Apache Druid cluster.
 
 The bundle includes several connection strings to Druid services such as the router console and SQL endpoints. The services may be used by other operators or tools to configure their products with access to Druid. This is limited to internal cluster access.
 
@@ -22,14 +23,14 @@ metadata:
 spec:
   [...]
 ----
-<1> The name of the Druid cluster, which is also the name of the created discovery `ConfigMap`.
-<2> The namespace of the discovery `ConfigMap`.
+<1> The name of the Druid cluster, which is also the name of the created discovery ConfigMap.
+<2> The namespace of the discovery ConfigMap.
 
-The resulting discovery `ConfigMap` is `{namespace}/{clusterName}`.
+The resulting discovery ConfigMap is `{namespace}/{clusterName}`.
 
 == Contents
 
-The `{namespace}/{clusterName}` discovery `ConfigMap` contains the following fields where `{clusterName}` represents the name and `{namespace}` the namespace of the cluster:
+The `{namespace}/{clusterName}` discovery ConfigMap contains the following fields where `{clusterName}` represents the name and `{namespace}` the namespace of the cluster:
 
 `DRUID_AVATICA_JDBC`::
 ====

diff --git a/docs/modules/druid/pages/usage-guide/index.adoc b/docs/modules/druid/pages/usage-guide/index.adoc
@@ -0,0 +1,10 @@
+= Usage guide
+:page-aliases: usage.doc
+
+The usage guide covers various aspects of configuring Druid and interconnection with other tools.
+
+xref:usage-guide/deep-storage.adoc[], xref:usage-guide/ingestion.adoc[] and xref:usage-guide/resources-and-storage.adoc[] are the relevant pages for configuring how your data is stored and ingested.
+
+The xref:usage-guide/security.adoc[] page explains how to configure TLS, authentication with LDAP and authorization using xref:opa:index.adoc[OPA].
+
+Look into xref:usage-guide/logging.adoc[] and xref:usage-guide/monitoring.adoc[] to learn how to observe your Druid status.
diff --git a/docs/modules/druid/pages/usage-guide/ingestion.adoc b/docs/modules/druid/pages/usage-guide/ingestion.adoc
@@ -0,0 +1,47 @@
+= Ingestion
+
+== [[s3]]From S3
+
+To ingest data from s3 you need to specify a host to connect to, but there are also other settings that can be used:
+
+[source,yaml]
+----
+spec:
+  clusterConfig:
+    ingestion:
+      s3connection:
+        host: yourhost.com  # <1>
+        port: 80 # optional <2>
+        credentials: # optional <3>
+        ...
+----
+
+<1> The S3 host, not optional
+<2> Port, optional, defaults to 80
+<3> Credentials to use. Since these might be bucket-dependent, they can instead be given in the ingestion job. Specifying the credentials here is explained <<S3 Credentials, below>>.
+
+include::partial$s3-note.adoc[]
+
+=== S3 credentials
+
+include::partial$s3-credentials.adoc[]
+
+== Adding external files, e.g. for ingestion
+
+Since Druid actively runs ingestion tasks there may be a need to make extra files available to the processes.
+
+These could for example be client certificates used to connect to a Kafka cluster or a keytab to obtain a Kerberos ticket.
+
+In order to make these files available the operator allows specifying extra volumes that will be added to all pods deployed for this cluster.
+
+[source,yaml]
+----
+spec:
+  clusterConfig:
+    extraVolumes:
+      - name: google-service-account
+        secret:
+          secretName: google-service-account
+----
+
+All `Volumes` specified in this section will be made available under `/stackable/userdata/\{volumename\}`.
diff --git a/docs/modules/druid/pages/usage-guide/logging.adoc b/docs/modules/druid/pages/usage-guide/logging.adoc
@@ -0,0 +1,32 @@
+= Log aggregation
+
+The logs can be forwarded to a Vector log aggregator by providing a discovery ConfigMap for the aggregator and by enabling the log agent:
+
+[source,yaml]
+----
+spec:
+  clusterConfig:
+    vectorAggregatorConfigMapName: vector-aggregator-discovery
+  brokers:
+    config:
+      logging:
+        enableVectorAgent: true
+  coordinators:
+    config:
+      logging:
+        enableVectorAgent: true
+  historicals:
+    config:
+      logging:
+        enableVectorAgent: true
+  middleManagers:
+    config:
+      logging:
+        enableVectorAgent: true
+  routers:
+    config:
+      logging:
+        enableVectorAgent: true
+----
+
+Further information on how to configure logging, can be found in xref:home:concepts:logging.adoc[].
diff --git a/docs/modules/druid/pages/usage-guide/monitoring.adoc b/docs/modules/druid/pages/usage-guide/monitoring.adoc
@@ -0,0 +1,4 @@
+= Monitoring
+
+The managed Druid instances are automatically configured to export Prometheus metrics. See
+xref:operators:monitoring.adoc[] for more details.
diff --git a/docs/modules/druid/pages/usage-guide/pod-placement.adoc b/docs/modules/druid/pages/usage-guide/pod-placement.adoc
@@ -0,0 +1,27 @@
+= Pod placement
+
+You can configure the Pod placement of the Druid pods as described in xref:concepts:pod_placement.adoc[].
+
+The default affinities created by the operator are:
+
+1. Distribute all Pods within the same role (brokers, coordinators, historicals, middle-managers, routers) (weight 70)
+
+Some of the Druid roles do frequently communicate with each other.
+To address this, some affinities will be created to attract these roles:
+
+*For brokers:*
+
+1. Co-locate with historicals (weight 60)
+2. Co-locate with middle-managers (weight 40)
+
+*For routers:*
+
+1. Co-locate with brokers (weight 40)
+
+*For historicals and middle-managers:*
+
+1. Co-locate the middle-managers and historicals with the hdfs datanodes if hdfs is used as deep storage (weight 50)
+
+*For coordinators:*
+
+- No affinities
Original file line number	Diff line number	Diff line change
Expand Up		@@ -164,4 +164,4 @@ Great! You've set up your first Druid cluster, ingested some data and queried it

		== What's next

		Have a look at the xref:usage.adoc[] page to find out more about the features of the Operator, such as S3 backed deep storage or OPA based authorization.
		Have a look at the xref:usage-guide/index.adoc[] page to find out more about the features of the Operator, such as S3-backed deep storage or OPA-based authorization.