This guide helps you set up a complete data pipeline using Kubernetes with various monitoring, logging, and data processing components.
-
Data Ingestion
- Source: Raw text files are stored in an MinIO (S3-compatible) bucket.
- Ingestion Tool: The Lenses S3 Source Connector for Kafka Connect reads the files and streams them into a Kafka topic.
-
Stream Processing
- Tool: A custom application or consumer.
- Function: Consumes the raw text data, simulates text processing (e.g., NLP, sentiment analysis, data enrichment), and produces the refined results to a new Kafka topic.
-
Data Sinking & Indexing
- Destination: Processed data is persisted in Elasticsearch for powerful search and analytics.
- Ingestion Tool: The Confluent Elasticsearch Sink Connector for Kafka Connect streams the data from Kafka into Elasticsearch indices.
-
Observability & Monitoring
- Metrics: System, JVM, and custom application metrics are collected and visualized using VictoriaMetrics.
- Logging: All application and infrastructure logs are centralized, stored, and analyzed using VictoriaLogs.
MinIO S3 → Kafka Connect S3 Source → Kafka → Microservice → Kafka Connect ElasticSearch Sink → ElasticSearch/Kibana │ └── VictoriaMetrics (Monitoring) + VictoriaLogs (Logging) + Grafana (Visualization)
- docker
- kubectl
- kind cli
- helm
Run the following command to create a Kubernetes cluster with 1 control-plane node and 3 worker nodes:
./cluster-setup.shComponents deployed:
-
Ingress-nginx
-
MetalLB
-
4 proxy image repositories in Docker containers
Run ./setup-vm.sh
Grafana Access:
-
Login: admin
-
Get Password:
kubectl get secret --namespace victoria-metrics vm-grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo
Run setup-nginx.sh
Recommended Dashboard:
Run ./setup-vl.sh
Run ./setup-minio.sh
MinIO Console Access:
-
Login: minio
-
Password: minio123
Recommended Dashboards:
Run ./setup-elastic.sh
Kibana Access:
-
Login: elastic
-
Get Password:
kubectl get secret elasticsearch-es-elastic-user -n elastic -o go-template='{{.data.elastic | base64decode }}'
Recommended Dashboards:
Deploy 3 Kafka brokers, 3 Kafka KRaft control nodes, and Schema Registry:
Run ./setup-strimzi.sh
Components Deployed:
-
Kafka Cluster in
kafkanamespace -
Schema Registry (accessible at
https://schema.kind.cluster) -
KafkaUser:
confluent-schema-registry -
KafkaTopic:
registry-schemas -
Warning Schema Registry uses HTTP/2
- Build Docker image:
docker build -t <your-repo>/<your-image-name>:<your-tag> -f kafka-s3-source-connector/Dockerfile .
For example: docker build -t ttl.sh/randsw-strimzi-connect-s3-4.1.0:24h -f kafka-s3-source-connector/Dockerfile .
- Push to your docker registry
- Set your connector image in KafkaConnect spec
./setup-kafka-connect-s3.sh- Build Docker image:
docker build -t <your-repo>/<your-image-name>:<your-tag> -f kafka-elasticsearch-sink-connector/Dockerfile .
For example: docker build -t ttl.sh/randsw-strimzi-connect-elastic-4.1.0:24h -f kafka-elasticsearch-sink-connectorDockerfile .
- Push to your docker registry
- Set your connector image in KafkaConnect spec
./setup-kafka-connect-elastic.shDeploy a real-time data processing microservice that:
-
Consumes raw data from Kafka topics
-
Validates, enriches, filters, and transforms data
-
Produces processed data to downstream topics
# Create Kafka user for microservice
kubectl apply -f consumer-producer/manifests/kafka-user.yaml
# Deploy microservice
kubectl apply -f consumer-producer/manifests/deployment.yamlGenerate test data to validate the entire pipeline:
cd minio-json-generator && go run main.goVerify data appears in ElasticSearch indices.

| Component | Purpose | Access URL |
|---|---|---|
| Grafana | Monitoring & Visualization | http://grafana.kind.cluster |
| Kibana | ElasticSearch UI & Analytics | http://kibana.kind.cluster |
| MinIO Console | S3 Storage Management | http://minio-console.kind.cluster |
| Schema Registry | Kafka Schema Management | http://schema.kind.cluster |
-
All passwords are retrieved from Kubernetes secrets
-
Schema Registry uses SSL/TLS with truststore authentication
-
Kafka users are managed via Strimzi KafkaUser CRD
-
Ingress controllers provide external access with proper routing
-
Verify all pods are running: kubectl get pods -A
-
Check ingress routes: kubectl get ingress -A
-
View logs for specific components using kubectl logs
-
Validate Kafka connectivity and topic creation
-
Confirm MinIO bucket creation and file uploads
This setup creates a scalable and resilient data processing system with comprehensive monitoring and logging capabilities.
