Skip to content

[Merged by Bors] - New Druid landing page #360

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 38 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
3dc8a3c
Added the diagram to the overview
Dec 14, 2022
9ed1469
Merge branch 'main' into idea/overview-diagram
Mar 21, 2023
790d0bf
more content
Mar 21, 2023
8718072
more text
Mar 21, 2023
1ecaeab
WIP: usage guide split
Mar 21, 2023
da72b9c
WIP: more changes
Mar 21, 2023
ca36808
WIP: more changes
Mar 21, 2023
fdefd4a
more stuff
Mar 22, 2023
34fe35c
text changes
Mar 22, 2023
25e95c1
Alternative anchor
Mar 22, 2023
5ed45aa
fixed some refs
Mar 22, 2023
0fb14a9
fixed some refs
Mar 22, 2023
94ee5a1
incorporated some feedback
Mar 23, 2023
e1c11b4
better text
Mar 23, 2023
cf52f61
fixed a todo
Mar 23, 2023
1e6f10e
Update docs/modules/druid/pages/getting_started/first_steps.adoc
fhennig Mar 27, 2023
8e9cce3
Update docs/modules/druid/pages/index.adoc
fhennig Mar 27, 2023
bae9424
Update docs/modules/druid/pages/index.adoc
fhennig Mar 27, 2023
0fdba9c
Update docs/modules/druid/pages/index.adoc
fhennig Mar 27, 2023
e864834
Update docs/modules/druid/pages/index.adoc
fhennig Mar 27, 2023
995f06f
Update docs/modules/druid/pages/index.adoc
fhennig Mar 27, 2023
ec3bc3a
Update docs/modules/druid/pages/index.adoc
fhennig Mar 27, 2023
a9766af
minor fixes
Mar 27, 2023
f8c1cd8
Update docs/modules/druid/pages/index.adoc
fhennig Mar 27, 2023
219cec1
Update docs/modules/druid/pages/usage-guide/configuration-and-environ…
fhennig Mar 27, 2023
7ae6623
Update docs/modules/druid/pages/usage-guide/index.adoc
fhennig Mar 27, 2023
9e1a6a3
Update docs/modules/druid/pages/usage-guide/ingestion.adoc
fhennig Mar 27, 2023
19cf59b
minor fixes
Mar 27, 2023
c263fd9
Update docs/modules/druid/pages/usage-guide/security.adoc
fhennig Mar 27, 2023
aefe7f5
Update docs/modules/druid/pages/usage-guide/security.adoc
fhennig Mar 27, 2023
5053178
Update docs/modules/druid/pages/usage-guide/security.adoc
fhennig Mar 27, 2023
81f2e56
typo fix
Mar 27, 2023
93e34d6
Update docs/modules/druid/pages/usage-guide/ingestion.adoc
fhennig Mar 27, 2023
5a9616c
Update docs/modules/druid/pages/usage-guide/resources-and-storage.adoc
fhennig Mar 27, 2023
710f10f
Update docs/modules/druid/pages/usage-guide/configuration-and-environ…
fhennig Mar 27, 2023
5ff4c08
Update docs/modules/druid/pages/usage-guide/security.adoc
fhennig Mar 27, 2023
8390578
Update docs/modules/druid/pages/usage-guide/resources-and-storage.adoc
fhennig Mar 27, 2023
33d8557
minor formatting changes
Mar 27, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions docs/modules/druid/images/druid_overview.drawio.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion docs/modules/druid/pages/getting_started/first_steps.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -164,4 +164,4 @@ Great! You've set up your first Druid cluster, ingested some data and queried it

== What's next

Have a look at the xref:usage.adoc[] page to find out more about the features of the Operator, such as S3 backed deep storage or OPA based authorization.
Have a look at the xref:usage-guide/index.adoc[] page to find out more about the features of the Operator, such as S3-backed deep storage or OPA-based authorization.
45 changes: 44 additions & 1 deletion docs/modules/druid/pages/index.adoc
Original file line number Diff line number Diff line change
@@ -1,6 +1,49 @@
= Stackable Operator for Apache Druid
:description: The Stackable Operator for Apache Druid is a Kubernetes operator that can manage Apache Druid clusters. Learn about its features, resources, dependencies, and demos, and see the list of supported Druid versions.
:keywords: Stackable Operator, Apache Druid, Kubernetes, operator, DevOps, engineer, CRD, StatefulSet, ConfigMap, Service, ZooKeeper, HDFS, S3, Kafka, Trino, OPA, demo, version

The Stackable Operator for Apache Druid is an operator that can deploy and manage https://druid.apache.org/[Apache Druid] clusters on Kubernetes. This operator provides several resources and features to manage Druid clusters efficiently.

== Getting Started

To get started with the Stackable Operator for Apache Druid, follow the xref:druid:getting_started/index.adoc[Getting Started guide]. The Operator is installed along with the _DruidCluster_ CustomResourceDefinition, which supports five xref:home:concepts:roles-and-role-groups.adoc[roles]: **Router**, **Coordinator**, **Broker**, **MiddleManager** and **Historical**. These roles correspond to https://druid.apache.org/docs/latest/design/processes.html[Druid processes].

== Resources

The Operator watches DruidCluster objects and creates multiple Kubernetes resources for each DruidCluster based on its configuration.

image::druid_overview.drawio.svg[A diagram depicting the Kubernetes resources created by the operator]

For every RoleGroup a **StatefulSet** is created. Each StatefulSet can contain multiple replicas (Pods). Each Pod has at least two containers: the main Druid container and a preparation container which just runs once at startup. If xref:usage-guide/logging.adoc[] is enabled, there is a sidecar container for logging too.
For every Role and RoleGroup the Operator creates a **Service**.

A **ConfigMap** is created for each RoleGroup containing 3 files: `jvm.config` and `runtime.properties` files generated from the DruidCluster configuration (See xref:usage-guide/index.adoc[] for more information), plus a `log4j2.properties` file used for xref:usage-guide/logging.adoc[].
For the whole DruidCluster a **xref:discovery.adoc[discovery ConfigMap]** is created which contains information on how to connect to the Druid cluster.

== Dependencies and other Operators to connect to

The Druid Operator has the following dependencies:

* A xref:usage-guide/deep-storage.adoc[deep storage] backend is required to persist data. Use either xref:usage-guide/deep-storage.adoc#hdfs[HDFS] with the xref:hdfs:index.adoc[] or xref:usage-guide/deep-storage.adoc#s3[S3].
* An SQL database to store metadata.
* Apache ZooKeeper via the xref:zookeeper:index.adoc[]. Apache ZooKeeper is used by Druid for internal communication between processes.
* The xref:commons-operator:index.adoc[] provides common CRDs such as xref:concepts:s3.adoc[] CRDs.
* The xref:secret-operator:index.adoc[] is required for things like S3 access credentials or LDAP integration.

Have a look at the xref:getting_started/index.adoc[getting started guide] for an example of a minimal working setup. Druid works well with other Stackable supported products, such as xref:kafka:index.adoc[Apache Kafka] for data ingestion xref:trino:index.adoc[Trino] for data processing or xref:superset:index.adoc[Superset] for data visualization. xref:opa:index.adoc[OPA] can be connected to create authorization policies. Have a look at the xref:usage-guide/index.adoc[] for more configuration options and have a look at the <<demos, demos>> for complete data pipelines you can install with a single command.

== [[demos]]Demos

xref:stackablectl::index.adoc[] supports installing xref:stackablectl::demos/index.adoc[] with a single command. The demos are complete data piplines which showcase multiple components of the Stackable platform working together and which you can try out interactively. Both demos below include Druid as part of the data pipeline:

=== Waterlevel Demo

The xref:stackablectl::demos/nifi-kafka-druid-water-level-data.adoc[] demo uses data from https://www.pegelonline.wsv.de/webservice/ueberblick[PEGELONLINE] to visualize water levels in rivers and coastal regions of Germany from historic and real time data.

=== Earthquake Demo

The xref:stackablectl::demos/nifi-kafka-druid-earthquake-data.adoc[] demo ingests https://earthquake.usgs.gov/[earthquake data] into a similar pipeline as is used in the waterlevel demo.

This is an operator for Kubernetes that can manage https://druid.apache.org/[Apache Druid] clusters.

== Supported Versions

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
= Configuration & Environment Overrides

The cluster definition also supports overriding configuration properties and environment variables, either per role or per role group, where the more specific override (role group) has precedence over the less specific one (role).

IMPORTANT: Overriding certain properties which are set by the operator (such as the HTTP port) can interfere with the operator and can lead to problems.

== Configuration Properties

For a role or role group, at the same level of `config`, you can specify: `configOverrides` for the `runtime.properties`. For example, if you want to set the `druid.server.http.numThreads` for the router to 100 adapt the `routers` section of the cluster resource like so:

[source,yaml]
----
routers:
roleGroups:
default:
config: {}
configOverrides:
runtime.properties:
druid.server.http.numThreads: "100"
replicas: 1
----

Just as for the `config`, it is possible to specify this at role level as well:

[source,yaml]
----
routers:
configOverrides:
runtime.properties:
druid.server.http.numThreads: "100"
roleGroups:
default:
config: {}
replicas: 1
----

All override property values must be strings.

For a full list of configuration options please refer to the Druid https://druid.apache.org/docs/latest/configuration/index.html[Configuration Reference].

== Environment Variables

In a similar fashion, environment variables can be (over)written. For example per role group:

[source,yaml]
----
routers:
roleGroups:
default:
config: {}
envOverrides:
MY_ENV_VAR: "MY_VALUE"
replicas: 1
----

or per role:

[source,yaml]
----
routers:
envOverrides:
MY_ENV_VAR: "MY_VALUE"
roleGroups:
default:
config: {}
replicas: 1
----

// cliOverrides don't make sense for this operator, so the feature is omitted for now
83 changes: 83 additions & 0 deletions docs/modules/druid/pages/usage-guide/deep-storage.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
= Deep storage configuration

== [[hdfs]]HDFS

Druid can use HDFS as a backend for deep storage:

[source,yaml]
----
spec:
clusterConfig:
deepStorage:
hdfs:
configMapName: simple-hdfs # <1>
directory: /druid # <2>
...
----
<1> Name of the HDFS cluster discovery config map. Can be supplied manually for a cluster not provided by Stackable. Needs to contain the `core-site.xml` and `hdfs-site.xml`.
<2> The directory where to store the druid data.

== [[s3]]S3

Druid can use S3 as a backend for deep storage:

[source,yaml]
----
spec:
clusterConfig:
deepStorage:
s3:
bucket:
inline:
bucketName: my-bucket # <1>
connection:
inline:
host: test-minio # <2>
port: 9000 # <3>
credentials: # <4>
...
----
<1> Bucket name.
<2> Bucket host.
<3> Optional bucket port.
<4> Credentials explained <<S3 Credentials, below>>.

It is also possible to configure the bucket connection details as a separate Kubernetes resource and only refer to that object from the DruidCluster like this:

[source,yaml]
----
spec:
clusterConfig:
deepStorage:
s3:
bucket:
reference: my-bucket-resource # <1>
----
<1> Name of the bucket resource with connection details.

The resource named `my-bucket-resource` is then defined as shown below:

[source,yaml]
----
---
apiVersion: s3.stackable.tech/v1alpha1
kind: S3Bucket
metadata:
name: my-bucket-resource
spec:
bucketName: my-bucket-name
connection:
inline:
host: test-minio
port: 9000
credentials:
... (explained below)
----

This has the advantage that bucket configuration can be shared across DruidClusters (and other stackable CRDs) and reduces the cost of updating these details.

include::partial$s3-note.adoc[]

=== S3 Credentials

include::partial$s3-credentials.adoc[]
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,9 @@
:routerPort: 8888

= Discovery
:page-aliases: discovery.adoc

The Stackable Operator for Apache Druid publishes a discovery https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.23/#configmap-v1-core[`ConfigMap`], which exposes a client configuration bundle that allows access to the Apache Druid cluster.
The Stackable Operator for Apache Druid publishes a discovery https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.23/#configmap-v1-core[ConfigMap], which exposes a client configuration bundle that allows access to the Apache Druid cluster.

The bundle includes several connection strings to Druid services such as the router console and SQL endpoints. The services may be used by other operators or tools to configure their products with access to Druid. This is limited to internal cluster access.

Expand All @@ -22,14 +23,14 @@ metadata:
spec:
[...]
----
<1> The name of the Druid cluster, which is also the name of the created discovery `ConfigMap`.
<2> The namespace of the discovery `ConfigMap`.
<1> The name of the Druid cluster, which is also the name of the created discovery ConfigMap.
<2> The namespace of the discovery ConfigMap.

The resulting discovery `ConfigMap` is `{namespace}/{clusterName}`.
The resulting discovery ConfigMap is `{namespace}/{clusterName}`.

== Contents

The `{namespace}/{clusterName}` discovery `ConfigMap` contains the following fields where `{clusterName}` represents the name and `{namespace}` the namespace of the cluster:
The `{namespace}/{clusterName}` discovery ConfigMap contains the following fields where `{clusterName}` represents the name and `{namespace}` the namespace of the cluster:

`DRUID_AVATICA_JDBC`::
====
Expand Down
10 changes: 10 additions & 0 deletions docs/modules/druid/pages/usage-guide/index.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
= Usage guide
:page-aliases: usage.doc

The usage guide covers various aspects of configuring Druid and interconnection with other tools.

xref:usage-guide/deep-storage.adoc[], xref:usage-guide/ingestion.adoc[] and xref:usage-guide/resources-and-storage.adoc[] are the relevant pages for configuring how your data is stored and ingested.

The xref:usage-guide/security.adoc[] page explains how to configure TLS, authentication with LDAP and authorization using xref:opa:index.adoc[OPA].

Look into xref:usage-guide/logging.adoc[] and xref:usage-guide/monitoring.adoc[] to learn how to observe your Druid status.
47 changes: 47 additions & 0 deletions docs/modules/druid/pages/usage-guide/ingestion.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
= Ingestion

== [[s3]]From S3

To ingest data from s3 you need to specify a host to connect to, but there are also other settings that can be used:

[source,yaml]
----
spec:
clusterConfig:
ingestion:
s3connection:
host: yourhost.com # <1>
port: 80 # optional <2>
credentials: # optional <3>
...
----

<1> The S3 host, not optional
<2> Port, optional, defaults to 80
<3> Credentials to use. Since these might be bucket-dependent, they can instead be given in the ingestion job. Specifying the credentials here is explained <<S3 Credentials, below>>.

include::partial$s3-note.adoc[]

=== S3 credentials

include::partial$s3-credentials.adoc[]

== Adding external files, e.g. for ingestion

Since Druid actively runs ingestion tasks there may be a need to make extra files available to the processes.

These could for example be client certificates used to connect to a Kafka cluster or a keytab to obtain a Kerberos ticket.

In order to make these files available the operator allows specifying extra volumes that will be added to all pods deployed for this cluster.

[source,yaml]
----
spec:
clusterConfig:
extraVolumes:
- name: google-service-account
secret:
secretName: google-service-account
----

All `Volumes` specified in this section will be made available under `/stackable/userdata/\{volumename\}`.
32 changes: 32 additions & 0 deletions docs/modules/druid/pages/usage-guide/logging.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
= Log aggregation

The logs can be forwarded to a Vector log aggregator by providing a discovery ConfigMap for the aggregator and by enabling the log agent:

[source,yaml]
----
spec:
clusterConfig:
vectorAggregatorConfigMapName: vector-aggregator-discovery
brokers:
config:
logging:
enableVectorAgent: true
coordinators:
config:
logging:
enableVectorAgent: true
historicals:
config:
logging:
enableVectorAgent: true
middleManagers:
config:
logging:
enableVectorAgent: true
routers:
config:
logging:
enableVectorAgent: true
----

Further information on how to configure logging, can be found in xref:home:concepts:logging.adoc[].
4 changes: 4 additions & 0 deletions docs/modules/druid/pages/usage-guide/monitoring.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
= Monitoring

The managed Druid instances are automatically configured to export Prometheus metrics. See
xref:operators:monitoring.adoc[] for more details.
27 changes: 27 additions & 0 deletions docs/modules/druid/pages/usage-guide/pod-placement.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
= Pod placement

You can configure the Pod placement of the Druid pods as described in xref:concepts:pod_placement.adoc[].

The default affinities created by the operator are:

1. Distribute all Pods within the same role (brokers, coordinators, historicals, middle-managers, routers) (weight 70)

Some of the Druid roles do frequently communicate with each other.
To address this, some affinities will be created to attract these roles:

*For brokers:*

1. Co-locate with historicals (weight 60)
2. Co-locate with middle-managers (weight 40)

*For routers:*

1. Co-locate with brokers (weight 40)

*For historicals and middle-managers:*

1. Co-locate the middle-managers and historicals with the hdfs datanodes if hdfs is used as deep storage (weight 50)

*For coordinators:*

- No affinities
Loading