Strimzi provides a way to run an Apache Kafka cluster on Kubernetes or OpenShift in various deployment configurations.
You will need a Kubernetes or OpenShift cluster to deploy Strimzi.
In order to interact with a Kubernetes cluster, be sure to have the kubectl tool installed. If you don't have a Kubernetes cluster up and running, the easiest way to deploy one, for development purposes, is to use the Minikube project which provides a single-node cluster in a VM. In order to do that you can just follow the installation guide which describes all the prerequisites and how to get the Minikube binaries. Finally, the cluster can be deployed by running
minikube start
In order to interact with an OpenShift cluster, be sure to have the OpenShift client tools installed. If you don't have an OpenShift cluster up and running, the easiest way to deploy one, for development purposes, is to use the Minishift project which provides a single-node cluster in a VM. In order to do that you can just follow the installation guide which describes all the prerequisites and how to get the Minishift binaries. Finally, the cluster can be deployed by running
minishift start
Another way is to use the OpenShift client tools directly to spin-up a single-node cluster. It will run OpenShift as a Docker container on the local machine.
oc cluster up
More information about this way can be found here.
This deployment uses the StatefulSets (previously known as "PetSets") feature of Kubernetes/OpenShift. With StatefulSets, the pods receive a unique name and network identity and that makes it easier to identify the individual Kafka broker pods and set their identity (broker ID). Each Kafka broker pod is using its own PersistentVolume. The PersistentVolume is acquired using PersistentVolumeClaim – that makes it independent on the actual type of the PersistentVolume. For example, it can use HostPath volumes on Minikube or Amazon EBS volumes in Amazon AWS deployments without any changes in the YAML files.
It's important to say that in this deployment both regular and headless services are used:
- regular services can be used as bootstrap servers for Kafka clients;
- headless services are needed to have DNS resolve the pods IP addresses directly.
This deployment is available under the kafka-statefulsets folder and provides following artifacts:
- Dockerfile : Docker file for building an image with Kafka and Zookeeper already installed
- config : configuration file templates for running Zookeeper
- scripts : scripts for starting up Kafka and Zookeeper servers
- resources : provides all YAML configuration files for setting up volumes, services and deployments
-
Create the provided "strimzi" template by running
oc create -f kafka-statefulsets/resources/openshift-template.yaml
in your terminal. This template provides the "zookeeper" StatefulSet with 3 replicas, the "kafka" StatefulSet with 3 replicas, and the "zookeeper", "zookeeper-headless", "kafka" and "kafka-headless" Services.
-
Create a new app using the "strimzi" template:
oc new-app strimzi
-
If your cluster doesn't have any default storage class, create the persistent volumes manually
kubectl apply -f kafka-statefulsets/resources/cluster-volumes.yaml
-
Create the services by running:
kubectl apply -f kafka-statefulsets/resources/kubernetes.yaml
-
You can then verify that the services started using
kubectl describe all
Kafka in-memory deployment is just for development and testing purposes and not for production. It is designed the same way as the Kafka StatefulSets deployment. The only difference is that for storing broker information (Zookeeper side) and topics/partitions (Kafka side), an emptyDir is used instead of Persistent Volume Claims. This means that its content is strictly related to the pod life cycle (deleted when the pod goes down). This makes the in-memory deployment well-suited to development and testing because you don't have to provide persistent volumes.
This deployment is available under the kafka-inmemory folder and provides following artifacts :
- resources : provides all YAML configuration files for setting up services and deployments
-
Create a pod using the provided template by running
oc create -f kafka-inmemory/resources/openshift-template.yaml
in your terminal. This template provides the "zookeeper" and the "kafka" deployments and the "zookeeper-service" and "kafka-service" services.
-
Create a new app:
oc new-app strimzi
-
Create the deployments and services by running:
kubectl apply -f kafka-inmemory/resources/kubernetes.yaml
-
You can then verify that the services started using
kubectl describe all
This deployment adds a Kafka Connect cluster which can be used with either of the Kafka deployments described above.
It is implemented as a deployment with a configurable number of workers.
The default image currently contains only the Connectors distributed with Apache Kafka Connect -
FileStreamSinkConnector
and FileStreamSourceConnector
.
The REST interface for managing the Kafka Connect cluster is exposed internally within the Kubernetes/OpenShift
cluster as service kafka-connect
on port 8083
.
-
Deploy a Kafka broker to your OpenShift cluster using either of the in-memory or statefulsets deployments above.
-
Create a pod using the provided template by running
oc create -f kafka-connect/resources/openshift-template.yaml
in your terminal.
-
Create a new app:
oc new-app strimzi-connect
-
Deploy a Kafka broker to your Kubernetes cluster using either of the in-memory or statefulsets deployments above.
-
Start the deployment by running
kubectl apply -f kafka-connect/resources/kubernetes.yaml
in your terminal.
Our Kafka Connect Docker images contain by default only the FileStreamSinkConnector
and
FileStreamSourceConnector
connectors which are part of the Apache Kafka project.
Kafka Connect will automatically load all plugins/connectors which are present in the /opt/kafka/plugins
directory during startup. You can use several different methods how to add the plugins into this directory:
- Mount a volume containing the plugins to path
/opt/kafka/plugins/
- Use the
strimzi/kafka-connect
image as Docker base image, add your connectors to the/opt/kafka/plugins/
directory and use this new image instead ofstrimzi/kafka-connect
- Use OpenShift build system and our S2I image
You can distribute your plugins to all your cluster nodes into the same path, make them
world-readable (chmod -R a+r /path/to/your/directory
) and use the hostPath volume to mount them into
your Kafka Connect deployment. To use the volume, you have to edit the deployment YAML files:
- Open the OpenShift template or Kubernetes deployment
- add the
volumeMounts
andvolumes
sections in the same way as in the example below - Redeploy Kafka Connect for the changes to take effect (If you have Kafka Connect already deployed, you have to apply the changes to the deployment and afterwards make sure all pods are restarted. If you haven't yet deployed Kakfa Connect, just follow the guide above and use your modified YAML files.)
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: kafka-connect
spec:
replicas: 1
template:
metadata:
labels:
name: kafka-connect
spec:
containers:
- name: kafka-connect
image: strimzi/kafka-connect:latest
ports:
- name: rest-api
containerPort: 8083
protocol: TCP
env:
- name: KAFKA_CONNECT_BOOTSTRAP_SERVERS
value: "kafka:9092"
livenessProbe:
httpGet:
path: /
port: rest-api
initialDelaySeconds: 60
volumeMounts:
- mountPath: /opt/kafka/plugins
name: pluginsvol
volumes:
- name: pluginsvol
hostPath:
path: /path/to/my/plugins
type: Directory
Alternatively, you can create Kubernetes/OpenShift persistent volume which contains additional plugins and modify the Kafka Connect deployment to use this volume. Since distributed Kafka Connect cluster can run on multiple nodes you need to make sure that the volume can be mounted as read only into multiple pods at the same time. Which volume types can be mounted read only on several pods can be found in Kubernetes documentation. Once you have such volume, you can edit the deployment YAML file as described above and just use your persistent volume instead of the hostPath volume. For example for GlusterFS, you can use:
volumes:
- name: pluginsvol
glusterfs:
endpoints: glusterfs-cluster
path: kube_vol
readOnly: true
- Create a new
Dockerfile
which usesstrimzi/kafka-connect
FROM strimzi/kafka-connect:latest
USER root:root
COPY ./my-plugin/ /opt/kafka/plugins/
USER kafka:kafka
- Build the Docker image and upload it to your Docker repository
- Use your new Docker image in your Kafka Connect deployment
OpenShift supports Builds which can be used together
with Source-to-Image (S2I) framework to
create new Docker images. OpenShift Build takes a builder image with the S2I support together with source code
and/or binaries provided by the user and uses them to build a new Docker image.
The newly created Docker Image will be stored in OpenShift's local Docker repository and can be used in deployments.
The Strimzi project provides a Kafka Connect S2I builder image strimzi/kafka-connect-s2i
which takes user-provided binaries (with plugins and connectors) and creates a new Kafka Connect image.
This enhanced Kafka Connect image can be used with our Kafka Connect deployment.
To configure the OpenShift Build and create a new Kafka Connect image, follow these steps:
- Create OpenShift build configuration and Kafka Connect deployment using our OpenShift template
oc apply -f kafka-connect/s2i/resources/openshift-template.yaml
oc new-app strimzi-connect-s2i
- Prepare a directory with Kafka Connect plugins which you want to use. For example:
$ tree ./my-plugins/
./my-plugins/
└── kafka-connect-jdbc
├── kafka-connect-jdbc-3.3.0.jar
├── postgresql-9.4-1206-jdbc41.jar
└── sqlite-jdbc-3.8.11.2.jar
- Start new image build using the prepared directory
oc start-build kafka-connect --from-dir ./my-plugins/
- Once the build is finished, the new image will be automatically used with your Kafka Connect deployment.
Each Kafka broker and Zookeeper server pod exposes metrics by means of a Prometheus endpoint. A JMX exporter, running as a Java agent, is in charge of getting metrics from the pod (both JVM metrics and metrics strictly related to the broker) and exposing them as such an endpoint.
The Metrics page details all the information for deploying a Prometheus server in the cluster in order to scrape the pods and obtain metrics. The same page describes how to set up a Grafana instance to have a dashboard showing the main configured metrics.