Skip to content

sakkiii/strimzi-kafka-helm

Strimzi Kafka Helm Chart

πŸ” Security Analysis (CodeQL) πŸš€ Publish Strimzi Kafka Helm Chart πŸ”„ Version Bump

Helm Chart Version Kafka Version Strimzi Version Kubernetes

License Security Rating Maintainability

GitHub stars GitHub forks GitHub issues GitHub pull requests

GitHub release (latest by date) GitHub Release Date GitHub last commit

A comprehensive, production-ready Helm chart for deploying Apache Kafka clusters using the Strimzi Kafka Operator on Kubernetes. This chart provides extensive parameterization and supports multiple environments with flexible configuration options.

πŸš€ Features

Core Capabilities

  • Multi-Environment Support: Dedicated configurations for nonprod, staging, and prod environments
  • Flexible Kafka Versions: Support for Kafka 3.8.0, 3.9.0, and newer versions
  • KRaft Mode Only: Modern ZooKeeper-less Kafka deployment (ZooKeeper is deprecated)
  • Flexible Node Roles: Support for Controller, Broker, or Dual-role nodes per Strimzi KRaft docs
  • Dynamic Scaling: Optional Horizontal Pod Autoscaler (HPA) support
  • Flexible Listener Configuration: List-based approach supporting unlimited listeners with different types, ports, and authentication

Advanced Features

  • Storage Flexibility: Configurable persistent storage with optional storage classes
  • Node Scheduling: Comprehensive affinity and toleration support for EKS nodegroups
  • Security: TLS encryption, SCRAM-SHA-512 authentication, and ACL-based authorization
  • Monitoring: Built-in JMX Prometheus Exporter metrics for Kafka, Cruise Control, and Connect
  • Large Message Support: Pre-configured for handling messages up to 10MB
  • Auto-Rebalancing: Cruise Control integration with customizable rebalancing goals

Operational Excellence

  • External DNS Integration: Automatic DNS record management for external listeners
  • Rack Awareness: Multi-AZ deployment support with automatic replica selector configuration
  • Resource Management: Configurable CPU and memory limits/requests
  • Pod Disruption Budgets: Built-in availability protection
  • Comprehensive Testing: Helm tests for connectivity validation
  • CI/CD Integration: Automated security scanning, template validation, and publishing

πŸ”’ Security Features

This Helm chart implements enterprise-grade security with comprehensive hardening:

Authentication & Authorization

  • Multiple Authentication Methods: TLS certificates and SCRAM-SHA-512 support
  • Fine-grained ACLs: Least-privilege access control with deny-by-default
  • Superuser Management: Explicit superuser configuration for admin access
  • User Management: Comprehensive KafkaUser resources with role-based permissions

Transport Security

  • TLS Everywhere: Mandatory TLS encryption on all listeners
  • Strong Cipher Suites: TLS 1.3 preferred with secure protocol configuration
  • Certificate Management: Support for Strimzi CA and external cert-manager integration
  • Client Authentication: Required client certificate or password authentication

Production Hardening

  • Secure Defaults: Auto-topic creation disabled, unclean leader election prevented
  • High Availability: 5-replica configuration with 3 min-sync replicas
  • Audit Logging: Security event logging and monitoring integration
  • Network Isolation: Listener-based network segmentation

Security Documentation

πŸ“– Complete Security Guide - Comprehensive security configuration, best practices, and troubleshooting

πŸ“‹ Prerequisites

  • Kubernetes 1.21+
  • Helm 3.8+
  • Strimzi Kafka Operator installed in your cluster
  • For external listeners: Ingress controller (e.g., NGINX) and ExternalDNS (optional)
  • For EKS: Appropriate nodegroups with taints/labels configured

πŸ› οΈ Installation

Quick Start

  1. Add the Helm repository (if using a repository):

    helm repo add strimzi-kafka /path/to/chart
    helm repo update
  2. Install with default values:

    helm install my-kafka strimzi-kafka/strimzi-kafka -n kafka-system --create-namespace
  3. Install for specific environment:

    # Non-production (release name: kafka-nonprod, namespace: om-kafka)
    helm install kafka-nonprod . -f values-nonprod.yaml -n om-kafka --create-namespace
    
    # Staging (release name: kafka-staging, namespace: om-kafka-staging)
    helm install kafka-staging . -f values-staging.yaml -n om-kafka-staging --create-namespace
    
    # Production (release name: kafka-prod, namespace: om-kafka-prod)
    helm install kafka-prod . -f values-prod.yaml -n om-kafka-prod --create-namespace

Using the Deployment Script

For easier multi-environment deployment, use the provided script:

# Make the script executable
chmod +x scripts/deploy.sh

# Deploy to different environments
./scripts/deploy.sh nonprod om-kafka kafka-nonprod
./scripts/deploy.sh staging om-kafka-staging kafka-staging
./scripts/deploy.sh prod om-kafka-prod kafka-prod

🎯 Flexible Naming Configuration

This Helm chart provides flexible naming with sensible defaults that follow Helm best practices:

🏷️ Default Behavior (Recommended)

  • Cluster Name: Defaults to {{ .Release.Name }} (your Helm release name)
  • Namespace: Defaults to {{ .Release.Namespace }} (your deployment namespace)
  • Node Pools: Prefixed with cluster name (e.g., my-kafka-dual-role)
  • Secrets: Prefixed with cluster name (e.g., my-kafka-kafka-tls-secret)

πŸ”§ Override Options

kafkaCluster:
  name: "my-custom-kafka"        # Override cluster name
  namespace: "custom-namespace"  # Override namespace
  nodePools:
    - name: "custom-dual-role"   # Will become: my-custom-kafka-custom-dual-role

βœ… Benefits

  • 🎯 Flexible: Override names when needed for specific requirements
  • πŸ“ Consistent: Follows Helm conventions by default
  • πŸ” Predictable: Resource names are clearly prefixed and organized
  • πŸš€ Simple: Works out-of-the-box with sensible defaults

Example:

# This creates a Kafka cluster named "my-kafka" in namespace "kafka-system"
helm install my-kafka . -n kafka-system --create-namespace

βš™οΈ Configuration

Core Parameters

Parameter Description Default Environment Specific
kafkaCluster.version Kafka version 3.9.0 βœ…
kafkaCluster.replicas Number of Kafka brokers 3 βœ…

Note:

  • The chart uses Helm built-in variables for naming: {{ .Release.Name }} for cluster name and {{ .Release.Namespace }} for namespace
  • Deploy to the desired namespace using helm install --namespace <namespace> <release-name>
  • The release name becomes your Kafka cluster name automatically

🐳 Docker Image Configuration

The chart provides comprehensive image configuration options for all Strimzi components, supporting both public and private registries with flexible override capabilities.

Global Image Defaults

Set default image settings that apply to all components unless overridden:

global:
  # Default Image Configuration
  defaultImageRegistry: "quay.io"              # Default: quay.io
  defaultImageRepository: "strimzi"            # Default: strimzi  
  defaultImageTag: "0.47.0-kafka-3.9.0"      # Strimzi version with Kafka version
  
  # Global Image Pull Configuration
  imagePullPolicy: "IfNotPresent"             # Always, Never, IfNotPresent
  imagePullSecrets:                           # Global pull secrets
    - name: "private-registry-secret"
    - name: "ecr-registry-secret"

Component-Specific Image Overrides

Override image settings for specific components:

# Strimzi Operator Image Configuration
strimzi:
  operator:
    image:
      registry: "my-registry.com"             # Override global registry
      repository: "custom-strimzi"           # Override global repository
      name: "operator"                       # Operator image name
      tag: "0.47.0-custom"                  # Override global tag
      pullPolicy: "Always"                   # Override global pull policy
      pullSecrets:                          # Additional component secrets
        - name: "operator-registry-secret"

# Kafka Cluster Image Configuration  
kafkaCluster:
  image:
    registry: ""                            # Empty = use global default
    repository: ""                          # Empty = use global default
    tag: ""                                # Empty = use global default
    pullPolicy: ""                         # Empty = use global default
    pullSecrets: []                        # Additional Kafka secrets

# Kafka Connect Image Configuration
kafkaConnects:
  - name: my-connect-cluster
    image:
      registry: "my-registry.com"
      repository: "custom-kafka"
      name: "kafka-connect"                 # Connect image name
      tag: "3.9.0-custom"
      pullPolicy: "Always"
      pullSecrets:
        - name: "connect-registry-secret"

Environment-Specific Examples

Non-Production (Public Registry)

global:
  defaultImageRegistry: "quay.io"
  defaultImageRepository: "strimzi"
  defaultImageTag: "0.47.0-kafka-3.9.0"
  imagePullPolicy: "IfNotPresent"
  imagePullSecrets: []  # No secrets for public registry

Production (Private ECR Registry)

global:
  defaultImageRegistry: "123456789012.dkr.ecr.us-east-1.amazonaws.com"
  defaultImageRepository: "strimzi"
  defaultImageTag: "0.47.0-kafka-3.9.0"
  imagePullPolicy: "Always"  # Always pull latest for production
  imagePullSecrets:
    - name: "ecr-registry-secret"

Image Pull Secrets Setup

For private registries, create image pull secrets:

# For ECR (AWS)
kubectl create secret docker-registry ecr-registry-secret \
  --docker-server=123456789012.dkr.ecr.us-east-1.amazonaws.com \
  --docker-username=AWS \
  --docker-password=$(aws ecr get-login-password --region us-east-1) \
  --namespace=kafka-system

# For Docker Hub
kubectl create secret docker-registry dockerhub-secret \
  --docker-server=docker.io \
  --docker-username=myusername \
  --docker-password=mypassword \
  --namespace=kafka-system

# For Harbor/Custom Registry  
kubectl create secret docker-registry harbor-secret \
  --docker-server=harbor.company.com \
  --docker-username=myusername \
  --docker-password=mypassword \
  --namespace=kafka-system

πŸ”’ Security Configuration

This chart provides comprehensive security features with production-grade hardening. For complete security documentation, see SECURITY.md.

Quick Security Setup

1. Basic TLS Configuration

kafkaCluster:
  listeners:
    # Internal TLS listener
    - name: "tls"
      port: 9093
      type: internal
      tls: true
      authentication:
        type: tls
    
    # SCRAM-SHA-512 listener for applications
    - name: "scram"
      port: 9094
      type: internal
      tls: true
      authentication:
        type: scram-sha-512

2. Production Security Hardening

kafkaCluster:
  authorization:
    type: simple
    superUsers:
      - CN=kafka-admin-prod
      - kafka-superuser-prod
  
  config:
    # CRITICAL: Security hardening
    auto.create.topics.enable: false
    allow.everyone.if.no.acl.found: false
    ssl.client.auth: required
    unclean.leader.election.enable: false
    
    # High availability
    default.replication.factor: 5
    min.insync.replicas: 3

3. KafkaUser with ACLs

kafkaUsers:
  - name: app-service-user
    authentication:
      type: scram-sha-512
    authorization:
      type: simple
      acls:
        # Read access to specific topics
        - resource:
            type: topic
            name: "app-events"
            patternType: literal
          operations: [Describe, Read]
          host: "*"
        
        # Consumer group access
        - resource:
            type: group
            name: "app-service-group"
            patternType: literal
          operations: [Read]
          host: "*"

Security Environment Examples

Development (Relaxed Security)

# values-dev.yaml
kafkaCluster:
  config:
    auto.create.topics.enable: true
    allow.everyone.if.no.acl.found: true
  listeners:
    - name: "internal"
      port: 9092
      type: internal
      tls: false
      authentication: {}

Production (Full Security)

# values-prod.yaml
kafkaCluster:
  config:
    auto.create.topics.enable: false
    allow.everyone.if.no.acl.found: false
    ssl.client.auth: required
    ssl.enabled.protocols: TLSv1.3,TLSv1.2
    ssl.protocol: TLSv1.3
  
  listeners:
    - name: "tls"
      port: 9093
      type: internal
      tls: true
      authentication:
        type: tls
    
    - name: "external"
      port: 9095
      type: ingress
      tls: true
      authentication:
        type: scram-sha-512
      configuration:
        class: "nginx"
        annotations:
          nginx.ingress.kubernetes.io/ssl-redirect: "true"
          cert-manager.io/cluster-issuer: "letsencrypt-prod"

Security Features Summary

Feature Development Staging Production
TLS Encryption Optional Required Required
Client Authentication None SCRAM/TLS TLS + SCRAM
ACL Authorization Disabled Basic Comprehensive
Auto Topic Creation Enabled Disabled Disabled
Superuser Access Open Limited Restricted
Certificate Management Self-signed Strimzi CA cert-manager

πŸ“– Complete Security Guide - Authentication, authorization, TLS configuration, ACL patterns, troubleshooting, and security checklists.

Kubernetes-style Ingress Configuration

The chart supports modern Kubernetes-style ingress configuration with optional annotations and flexible TLS settings. This provides better control over ingress resources and follows Kubernetes best practices.

Ingress Configuration Options

kafkaCluster:
  listeners:
    external:
      ingress:
        # Ingress class name (required)
        className: "nginx"  # or "alb", "traefik", etc.
        
        # Ingress host (required)
        host: "kafka.example.com"
        
        # Optional annotations - if {} then disabled/not required
        annotations:
          external-dns.alpha.kubernetes.io/hostname: "kafka.example.com"
          external-dns.alpha.kubernetes.io/ttl: "60"
          nginx.ingress.kubernetes.io/ssl-redirect: "true"
          cert-manager.io/cluster-issuer: "letsencrypt-prod"
        
        # TLS Configuration
        tls:
          enabled: true  # true or false
          secretName: "kafka-tls-secret"  # optional - auto-generated if not specified
      
      # Broker configuration - inherits className, annotations, and TLS from parent
      brokers:
        # Host pattern: broker-{broker}-{parent.host} (e.g., broker-0-kafka.example.com)
        # Brokers automatically inherit all parent ingress settings
        hostPattern: "broker-{broker}-kafka.example.com"  # Optional override

Environment-Specific Examples

Non-Production: Simplified setup with TLS disabled

kafkaCluster:
  listeners:
    external:
      ingress:
        className: "nginx"
        host: "kafka.dev.example.com"
        annotations:
          external-dns.alpha.kubernetes.io/hostname: "kafka.dev.example.com"
        tls:
          enabled: false  # Simplified for development
      brokers:
        # Brokers inherit all settings from parent
        # Host pattern: broker-0-kafka.dev.example.com, broker-1-kafka.dev.example.com, etc.

Production: Full security with TLS and cert-manager

kafkaCluster:
  listeners:
    external:
      ingress:
        className: "nginx"
        host: "kafka.prod.example.com"
        annotations:
          external-dns.alpha.kubernetes.io/hostname: "kafka.prod.example.com"
          cert-manager.io/cluster-issuer: "letsencrypt-prod"
          nginx.ingress.kubernetes.io/ssl-redirect: "true"
          nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
        tls:
          enabled: true
          secretName: "kafka-prod-tls"
      brokers:
        # Brokers inherit all settings from parent including TLS and cert-manager
        # Host pattern: broker-0-kafka.prod.example.com, broker-1-kafka.prod.example.com, etc.

Optional Annotations

Annotations are completely optional. If you set annotations: {} or omit the annotations section entirely, no annotations will be applied to the ingress resources. This provides maximum flexibility for different deployment scenarios.

KRaft Node Roles Configuration

This chart supports KRaft mode only (ZooKeeper is deprecated). You can configure flexible node roles as per the Strimzi KRaft documentation:

Supported Node Roles

  1. Controller: Manages cluster metadata and leader elections
  2. Broker: Handles client requests and data storage
  3. Dual-role: Both controller and broker functions (suitable for dev/test)

Configuration Examples

Development/Testing - Dual-role nodes:

kafkaCluster:
  nodePools:
    - name: "kafka-dual-role"
      replicas: 3  # Minimum for KRaft quorum
      roles:
        - broker
        - controller

Production - Dedicated roles (recommended):

kafkaCluster:
  nodePools:
    # Dedicated controllers for metadata management
    - name: "kafka-controllers"
      replicas: 3  # Odd number for quorum (3 or 5)
      roles:
        - controller
      resources:
        requests:
          memory: "2Gi"
          cpu: "500m"
    
    # Dedicated brokers for client traffic
    - name: "kafka-brokers"
      replicas: 6  # Scale based on throughput needs
      roles:
        - broker
      resources:
        requests:
          memory: "8Gi"
          cpu: "2000m"

Environment-Specific Patterns

  • Non-Production: Dual-role nodes for cost efficiency
  • Staging: Mixed setup to test production patterns
  • Production: Dedicated roles for optimal performance and isolation

Global Node Selection, Affinity and Tolerations

The chart supports flexible node selection through both simple nodeSelector and advanced affinity configurations, plus tolerations that apply to all Kafka components (brokers, entity operator, Kafka exporter, Cruise Control) unless explicitly overridden at the component level.

🎯 Global Override System

Key Benefits:

  • βœ… DRY Principle: Configure once, apply everywhere
  • βœ… Consistent Scheduling: All components use same node selection by default
  • βœ… Easy Management: Change global settings to affect all components
  • βœ… Selective Override: Override only specific components when needed

How it Works:

  1. Global Configuration: Set global.nodeSelector, global.affinity, global.tolerations
  2. Automatic Inheritance: All Kafka components inherit these settings
  3. Component Override: Override at component level only when different behavior is needed

✨ Before vs After Example

❌ Before (Repetitive):

# Repeated in every component - hard to maintain!
kafkaCluster:
  nodePools:
    - template:
        pod:
          affinity:
            nodeAffinity:
              requiredDuringSchedulingIgnoredDuringExecution:
                nodeSelectorTerms:
                  - matchExpressions:
                      - key: eks.amazonaws.com/nodegroup
                        operator: In
                        values: ["kafka-nodes"]
entityOperator:
  template:
    pod:
      affinity: # Same config repeated again!
        nodeAffinity: # ... 15 lines of duplicate config
cruiseControl:
  template:
    pod:
      affinity: # Same config repeated again!
        nodeAffinity: # ... 15 lines of duplicate config

βœ… After (Clean & DRY):

# Configure once globally
global:
  nodeSelector:
    eks.amazonaws.com/nodegroup: "kafka-nodes"

# All components automatically inherit - no repetition!
kafkaCluster:
  nodePools:
    - template:
        pod:
          affinity: {} # Inherits from global
entityOperator:
  template:
    pod:
      affinity: {} # Inherits from global
cruiseControl:
  template:
    pod:
      affinity: {} # Inherits from global

# Override only when needed
kafkaExporter:
  template:
    pod:
      nodeSelector:
        special-node: "monitoring" # Override for this component only

Node Selection Options

Simple Node Selection (Recommended): Use nodeSelector for straightforward node targeting:

global:
  nodeSelector:
    eks.amazonaws.com/nodegroup: "kafka-nodegroup"
    node-type: "kafka-dedicated"
    kubernetes.io/arch: "amd64"

Advanced Node Selection: Use affinity.nodeAffinity for complex node selection logic:

global:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
              - key: eks.amazonaws.com/nodegroup
                operator: In
                values:
                  - "kafka-nodegroup-1"
                  - "kafka-nodegroup-2"

Global Configuration

global:
  # Global Affinity Configuration (applied to all components unless overridden)
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
              - key: eks.amazonaws.com/nodegroup
                operator: In
                values:
                  - "kafka-nodegroup"
              - key: node-type
                operator: In
                values:
                  - "kafka-dedicated"
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        - labelSelector:
            matchExpressions:
              - key: strimzi.io/cluster
                operator: In
                values:
                  - "{{ .Values.global.clusterName }}"
          topologyKey: kubernetes.io/hostname
  
  # Global Tolerations Configuration (applied to all components unless overridden)
  tolerations:
    - key: "dedicated"
      operator: "Equal"
      value: "kafka"
      effect: "NoSchedule"
    - key: "identifier"
      operator: "Equal"
      value: "kafka-nodegroup-taint"
      effect: "NoSchedule"

Component-Level Overrides

Individual components can override global affinity and tolerations:

kafkaCluster:
  nodePools:
    - template:
        pod:
          # Override global affinity for this node pool
          affinity:
            nodeAffinity:
              requiredDuringSchedulingIgnoredDuringExecution:
                nodeSelectorTerms:
                  - matchExpressions:
                      - key: node-type
                        operator: In
                        values:
                          - "kafka-broker-only"
          # Override global tolerations for this node pool
          tolerations:
            - key: "broker-dedicated"
              operator: "Equal"
              value: "true"
              effect: "NoSchedule"
  
  entityOperator:
    template:
      pod:
        # Entity operator can have different affinity/tolerations
        affinity:
          nodeAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
              nodeSelectorTerms:
                - matchExpressions:
                    - key: node-type
                      operator: In
                      values:
                        - "kafka-management"

Environment-Specific Examples

Non-Production: Basic node affinity, no pod anti-affinity

global:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
              - key: eks.amazonaws.com/nodegroup
                operator: In
                values:
                  - "dev-nodegroup"
  tolerations:
    - key: "dev-taint"
      operator: "Equal"
      value: "kafka"
      effect: "NoSchedule"

Production: Strict node affinity and pod anti-affinity

global:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
              - key: eks.amazonaws.com/nodegroup
                operator: In
                values:
                  - "kafka-dedicated-nodegroup"
              - key: node-type
                operator: In
                values:
                  - "kafka-dedicated"
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:  # Hard anti-affinity for production
        - labelSelector:
            matchExpressions:
              - key: strimzi.io/cluster
                operator: In
                values:
                  - "kafka-cluster-prod"
          topologyKey: kubernetes.io/hostname
  tolerations:
    - key: "dedicated"
      operator: "Equal"
      value: "kafka"
      effect: "NoSchedule"

Storage Configuration

kafkaCluster:
  nodePools:
    - storage:
        enabled: true
        type: jbod  # or persistent-claim
        volumes:
          - id: 1
            type: persistent-claim
            size: 100Gi  # Configurable per environment
            deleteClaim: false
            kraftMetadata: shared
            storageClass: gp3  # Optional: specify storage class

🎧 Flexible Listener Configuration

The chart now supports flexible list-based listeners - add as many listeners as needed with different configurations:

Basic Example

kafkaCluster:
  listeners:
    # Internal TLS listener (always recommended)
    - name: "tls"
      port: 9093
      type: internal
      tls: true
      authentication:
        type: tls
    
    # External ingress listener
    - name: "external"
      port: 9095
      type: ingress
      tls: true
      authentication:
        type: scram-sha-512
      configuration:
        class: "nginx"
        bootstrap:
          host: "kafka.example.com"
          annotations:
            external-dns.alpha.kubernetes.io/hostname: "kafka.example.com"
        brokers:
          hostTemplate: "kafka-{id}.example.com"
          annotations:
            external-dns.alpha.kubernetes.io/hostname: "kafka-{id}.example.com"
        tls:
          secretName: "kafka-tls-secret"
          brokerSecretName: "kafka-broker-tls"

Advanced Multi-Listener Example

kafkaCluster:
  listeners:
    # Internal TLS for inter-broker communication
    - name: "tls"
      port: 9093
      type: internal
      tls: true
      authentication:
        type: tls
    
    # Internal SCRAM for applications
    - name: "scram"
      port: 9094
      type: internal
      tls: true
      authentication:
        type: scram-sha-512
    
    # External ingress for web clients
    - name: "web-clients"
      port: 9095
      type: ingress
      tls: true
      authentication:
        type: scram-sha-512
      configuration:
        class: "nginx"
        bootstrap:
          host: "kafka-web.example.com"
        brokers:
          hostTemplate: "kafka-web-{id}.example.com"
    
    # External LoadBalancer for internal services
    - name: "internal-services"
      port: 9096
      type: loadbalancer
      tls: true
      authentication:
        type: tls
      configuration:
        loadBalancerSourceRanges:
          - "10.0.0.0/8"
          - "172.16.0.0/12"
    
    # NodePort for development access
    - name: "dev-access"
      port: 9097
      type: nodeport
      tls: false
      authentication:
        type: scram-sha-512
      configuration:
        nodePort: 32000

βœ… Benefits of List-Based Listeners

  • πŸ”§ Unlimited Flexibility: Add any number of listeners with different configurations
  • 🎯 Purpose-Specific: Create listeners for different client types (web, mobile, internal services)
  • πŸ”’ Security Granular: Different authentication methods per listener
  • 🌐 Multi-Environment: Different ingress hosts and TLS configurations per listener
  • πŸ“ˆ Scalable: Easy to add new listeners without restructuring existing ones

Node Affinity and Tolerations

Configure pod scheduling for EKS nodegroups:

kafkaCluster:
  nodePools:
    - template:
        pod:
          affinity:
            enabled: true
            nodeAffinity:
              requiredDuringSchedulingIgnoredDuringExecution:
                nodeSelectorTerms:
                  - matchExpressions:
                      - key: eks.amazonaws.com/nodegroup
                        operator: In
                        values:
                          - "kafka-nodegroup"
            podAntiAffinity:
              enabled: true  # Recommended for production
          tolerations:
            enabled: true
            tolerations:
              - key: "kafka-dedicated"
                operator: "Equal"
                value: "true"
                effect: "NoSchedule"

Horizontal Pod Autoscaler (HPA)

⚠️ CRITICAL LIMITATION: HPA MUST NOT target Kafka broker StatefulSets managed by the Strimzi operator.

❌ What HPA CANNOT Do

  • Cannot scale Kafka brokers: Broker scaling is exclusively managed by the Strimzi operator
  • Cannot target StatefulSets: HPA cannot manage StatefulSets created by the operator
  • Cannot handle data rebalancing: Scaling brokers requires data rebalancing coordination

βœ… What HPA CAN Do

HPA can only be used for ancillary components not managed by the Strimzi operator:

# βœ… VALID: HPA for Kafka Connect (separate deployment)
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: kafka-connect-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-connect-cluster-connect  # Kafka Connect deployment
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

# βœ… VALID: HPA for MirrorMaker2 (separate deployment)
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: mirrormaker2-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-mm2-cluster-mirrormaker2
  minReplicas: 1
  maxReplicas: 5

πŸ”§ Proper Broker Scaling

To scale Kafka brokers, update the replicas configuration and apply Helm upgrade:

# values-prod.yaml
kafkaCluster:
  replicas: 5  # Scale from 3 to 5 brokers

# Apply the change
helm upgrade my-kafka . -f values-prod.yaml

The Strimzi operator will:

  1. Create new broker pods
  2. Update cluster metadata
  3. Trigger automatic rebalancing (if Cruise Control is enabled)
  4. Ensure data distribution across new brokers

πŸ“Š Monitoring Broker Capacity

Instead of HPA, monitor these metrics for manual scaling decisions:

# Monitor broker CPU/Memory usage
kubectl top pods -l strimzi.io/cluster=my-kafka

# Check disk usage per broker
kubectl exec my-kafka-dual-role-0 -c kafka -- df -h /var/lib/kafka

# Monitor partition distribution
kubectl exec my-kafka-dual-role-0 -c kafka -- bin/kafka-topics.sh \
  --bootstrap-server localhost:9092 --describe --under-replicated-partitions

Metrics Configuration

Enable comprehensive monitoring with JMX Prometheus Exporter:

kafkaCluster:
  metricsConfig:
    enabled: true
    type: jmxPrometheusExporter
    configMapName: kafka-metrics
    configMapKey: kafka-metrics-config.yml
  
  kafkaExporter:
    enabled: true  # Additional Kafka-specific metrics
  
  cruiseControl:
    enabled: true
    metricsConfig:
      enabled: true
      type: jmxPrometheusExporter
      configMapName: cruise-control-metrics
      configMapKey: metrics-config.yml

Kafka Users and Topics

Define users and topics declaratively:

kafkaUsers:
  - name: app-user
    authentication:
      type: scram-sha-512
    authorization:
      type: simple
      acls:
        - resource:
            type: topic
            name: "app-*"
            patternType: prefix
          operations: [Describe, Read, Write]
          host: "*"

kafkaTopics:
  - name: user-events
    partitions: 12
    replicas: 3
    config:
      retention.ms: 604800000  # 7 days
      segment.bytes: 1073741824
      compression.type: lz4

Kafka Connect

Deploy Kafka Connect clusters with custom connectors:

kafkaConnects:
  - name: analytics-connect
    replicas: 3
    version: "3.9.0"
    resources:
      requests:
        memory: 4Gi
        cpu: 2
      limits:
        memory: 4Gi
        cpu: 4
    bootstrapServers: "kafka.example.com:443"
    build:
      output:
        type: docker
        image: "registry.example.com/kafka-connect:latest"
      plugins:
        - name: opensearch-sink
          artifacts:
            - type: zip
              url: "https://github.com/Aiven-Open/opensearch-connector-for-apache-kafka/releases/download/v3.1.1/opensearch-connector-for-apache-kafka-3.1.1.zip"

🎯 Cruise Control Configuration

Cruise Control provides intelligent rebalancing and optimization for Kafka clusters. This section covers goal templates aligned with Strimzi defaults and operational guidance.

Default Goal Templates

Production Rebalancing Goals (Recommended)

kafkaCluster:
  cruiseControl:
    enabled: true
    config:
      # Strimzi default goals - optimized for production stability
      default.goals: >
        RackAwareGoal,
        ReplicaCapacityGoal,
        DiskCapacityGoal,
        NetworkInboundCapacityGoal,
        NetworkOutboundCapacityGoal,
        CpuCapacityGoal,
        ReplicaDistributionGoal,
        PotentialNwOutGoal,
        DiskUsageDistributionGoal,
        NetworkInboundUsageDistributionGoal,
        NetworkOutboundUsageDistributionGoal,
        CpuUsageDistributionGoal,
        LeaderReplicaDistributionGoal,
        LeaderBytesInDistributionGoal
      
      # Hard goals that cannot be violated
      hard.goals: >
        RackAwareGoal,
        ReplicaCapacityGoal,
        DiskCapacityGoal,
        NetworkInboundCapacityGoal,
        NetworkOutboundCapacityGoal,
        CpuCapacityGoal
      
      # Self-healing configuration
      self.healing.goals: >
        RackAwareGoal,
        ReplicaCapacityGoal,
        DiskCapacityGoal
      
      # Anomaly detection
      anomaly.detection.goals: >
        RackAwareGoal,
        ReplicaCapacityGoal,
        DiskCapacityGoal,
        NetworkInboundCapacityGoal,
        NetworkOutboundCapacityGoal,
        CpuCapacityGoal

# KafkaRebalance templates for different scenarios
kafkaRebalances:
  # Full cluster rebalance (use sparingly)
  - name: "full-rebalance"
    mode: "full"
    goals:
      - RackAwareGoal
      - ReplicaCapacityGoal
      - DiskCapacityGoal
      - NetworkInboundCapacityGoal
      - NetworkOutboundCapacityGoal
      - CpuCapacityGoal
      - ReplicaDistributionGoal
      - PotentialNwOutGoal
      - DiskUsageDistributionGoal
      - NetworkInboundUsageDistributionGoal
      - NetworkOutboundUsageDistributionGoal
      - CpuUsageDistributionGoal
      - LeaderReplicaDistributionGoal
      - LeaderBytesInDistributionGoal
  
  # Add brokers rebalance (when scaling up)
  - name: "add-brokers-rebalance"
    mode: "add-brokers"
    goals:
      - RackAwareGoal
      - ReplicaCapacityGoal
      - DiskCapacityGoal
      - ReplicaDistributionGoal
      - DiskUsageDistributionGoal
  
  # Remove brokers rebalance (when scaling down)
  - name: "remove-brokers-rebalance"
    mode: "remove-brokers"
    goals:
      - RackAwareGoal
      - ReplicaCapacityGoal
      - DiskCapacityGoal
      - ReplicaDistributionGoal

Conservative Rebalancing Goals (Minimal Impact)

kafkaCluster:
  cruiseControl:
    enabled: true
    config:
      # Conservative goals for minimal cluster disruption
      default.goals: >
        RackAwareGoal,
        ReplicaCapacityGoal,
        DiskCapacityGoal,
        ReplicaDistributionGoal
      
      hard.goals: >
        RackAwareGoal,
        ReplicaCapacityGoal,
        DiskCapacityGoal

Operational Guidance

When to Disable Cruise Control

⚠️ CRITICAL: Disable Cruise Control during these disruptive operations:

  1. Kafka Version Upgrades

    kafkaCluster:
      cruiseControl:
        enabled: false  # Disable during upgrade
  2. Broker Configuration Changes

    • JVM settings modifications
    • Storage configuration changes
    • Network configuration updates
  3. Cluster Maintenance

    • Node maintenance windows
    • Kubernetes cluster upgrades
    • Storage system maintenance
  4. Emergency Situations

    • Broker failures requiring immediate attention
    • Network partitions or connectivity issues
    • Data corruption incidents

Re-enabling After Maintenance

# 1. Ensure all brokers are healthy
kubectl get pods -l strimzi.io/cluster=my-kafka

# 2. Check for under-replicated partitions
kubectl exec my-kafka-dual-role-0 -c kafka -- bin/kafka-topics.sh \
  --bootstrap-server localhost:9092 --describe --under-replicated-partitions

# 3. Re-enable Cruise Control
helm upgrade my-kafka . -f values-prod.yaml  # With cruiseControl.enabled: true

# 4. Wait for Cruise Control to start
kubectl get pods -l strimzi.io/name=my-kafka-cruise-control

Goal Explanations

Goal Purpose Impact Recommended Use
RackAwareGoal Ensures replicas are distributed across racks/AZs High Always include (hard goal)
ReplicaCapacityGoal Prevents brokers from exceeding replica limits High Always include (hard goal)
DiskCapacityGoal Prevents disk space exhaustion High Always include (hard goal)
NetworkInboundCapacityGoal Balances network inbound traffic Medium Production clusters
NetworkOutboundCapacityGoal Balances network outbound traffic Medium Production clusters
CpuCapacityGoal Balances CPU utilization Medium Production clusters
ReplicaDistributionGoal Evenly distributes replicas Low General optimization
LeaderReplicaDistributionGoal Evenly distributes leader replicas Low Performance optimization
DiskUsageDistributionGoal Balances disk usage Low Storage optimization

Monitoring Cruise Control

Key Metrics to Monitor

# Cruise Control status
kubectl get pods -l strimzi.io/name=my-kafka-cruise-control

# Active rebalances
kubectl get kafkarebalance

# Cruise Control logs
kubectl logs deployment/my-kafka-cruise-control

# Anomaly detector status
kubectl exec my-kafka-cruise-control-xxx -- curl -s localhost:9090/kafkacruisecontrol/state

Prometheus Metrics (if enabled)

Monitor these Cruise Control metrics:

  • kafka_cruisecontrol_anomaly_detector_mean_time_between_anomalies_ms
  • kafka_cruisecontrol_executor_execution_stopped
  • kafka_cruisecontrol_monitor_sampling_rate

Troubleshooting

Common Issues

  1. Cruise Control Pod Not Starting

    # Check logs for configuration errors
    kubectl logs deployment/my-kafka-cruise-control
    
    # Verify Kafka cluster is healthy
    kubectl get kafka my-kafka -o yaml
  2. Rebalance Stuck in Pending State

    # Check rebalance status
    kubectl describe kafkarebalance my-rebalance
    
    # Check Cruise Control logs
    kubectl logs deployment/my-kafka-cruise-control | grep -i rebalance
  3. Goals Cannot Be Satisfied

    # Review goal configuration
    kubectl get kafka my-kafka -o yaml | grep -A 20 cruiseControl
    
    # Check cluster capacity
    kubectl exec my-kafka-cruise-control-xxx -- curl -s \
      "localhost:9090/kafkacruisecontrol/load?json=true"

References

πŸ‘€ KafkaUser Resources & ACL Examples

This section provides comprehensive examples of KafkaUser resources with SCRAM authentication and role-based ACL configurations.

Basic KafkaUser with SCRAM Authentication

apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaUser
metadata:
  name: app-producer
  labels:
    strimzi.io/cluster: my-kafka
spec:
  authentication:
    type: scram-sha-512
  authorization:
    type: simple
    acls:
      # Producer permissions for specific topics
      - resource:
          type: topic
          name: orders
        operations: [Write, Describe]
      - resource:
          type: topic
          name: payments
        operations: [Write, Describe]
      
      # Schema Registry permissions (if using Confluent Schema Registry)
      - resource:
          type: topic
          name: _schemas
        operations: [Read, Write, Describe]
      
      # Consumer group for monitoring/health checks
      - resource:
          type: group
          name: app-producer-monitoring
        operations: [Read]

Consumer Application User

apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaUser
metadata:
  name: app-consumer
  labels:
    strimzi.io/cluster: my-kafka
spec:
  authentication:
    type: scram-sha-512
  authorization:
    type: simple
    acls:
      # Consumer permissions for specific topics
      - resource:
          type: topic
          name: orders
        operations: [Read, Describe]
      - resource:
          type: topic
          name: payments
        operations: [Read, Describe]
      
      # Consumer group permissions
      - resource:
          type: group
          name: order-processing-service
        operations: [Read]
      - resource:
          type: group
          name: payment-processing-service
        operations: [Read]
      
      # Offset management
      - resource:
          type: topic
          name: __consumer_offsets
        operations: [Read]

Admin User (Full Cluster Access)

apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaUser
metadata:
  name: kafka-admin
  labels:
    strimzi.io/cluster: my-kafka
spec:
  authentication:
    type: scram-sha-512
  authorization:
    type: simple
    acls:
      # Full cluster administration
      - resource:
          type: cluster
        operations: [All]
      
      # All topic operations
      - resource:
          type: topic
          name: "*"
        operations: [All]
      
      # All consumer group operations
      - resource:
          type: group
          name: "*"
        operations: [All]
      
      # Transaction operations (for exactly-once semantics)
      - resource:
          type: transactionalId
          name: "*"
        operations: [All]

Role-Based ACL Bundles

Data Engineering Role

apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaUser
metadata:
  name: data-engineer
  labels:
    strimzi.io/cluster: my-kafka
spec:
  authentication:
    type: scram-sha-512
  authorization:
    type: simple
    acls:
      # Read access to all data topics
      - resource:
          type: topic
          name: data.*
          patternType: prefix
        operations: [Read, Describe]
      
      # Write access to analytics topics
      - resource:
          type: topic
          name: analytics.*
          patternType: prefix
        operations: [Write, Create, Describe]
      
      # Consumer groups for data processing
      - resource:
          type: group
          name: data-engineering.*
          patternType: prefix
        operations: [Read]
      
      # Schema Registry access
      - resource:
          type: topic
          name: _schemas
        operations: [Read, Write, Describe]

Monitoring Role

apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaUser
metadata:
  name: monitoring-user
  labels:
    strimzi.io/cluster: my-kafka
spec:
  authentication:
    type: scram-sha-512
  authorization:
    type: simple
    acls:
      # Read-only access to all topics for monitoring
      - resource:
          type: topic
          name: "*"
        operations: [Read, Describe]
      
      # Consumer groups for monitoring tools
      - resource:
          type: group
          name: monitoring.*
          patternType: prefix
        operations: [Read]
      
      # Cluster metadata access
      - resource:
          type: cluster
        operations: [Describe]

Development Role (Limited Access)

apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaUser
metadata:
  name: developer
  labels:
    strimzi.io/cluster: my-kafka
spec:
  authentication:
    type: scram-sha-512
  authorization:
    type: simple
    acls:
      # Access only to development topics
      - resource:
          type: topic
          name: dev.*
          patternType: prefix
        operations: [Read, Write, Create, Describe]
      
      # Test topics access
      - resource:
          type: topic
          name: test.*
          patternType: prefix
        operations: [Read, Write, Create, Delete, Describe]
      
      # Development consumer groups
      - resource:
          type: group
          name: dev.*
          patternType: prefix
        operations: [Read, Write, Create, Delete]
      
      # Limited cluster access
      - resource:
          type: cluster
        operations: [Describe]

Prefix Pattern Implications

⚠️ Important Considerations for Prefix Patterns

Prefix patterns (patternType: prefix) have significant security implications:

  1. Overly Broad Access

    # ❌ DANGEROUS: Grants access to ALL topics starting with "app"
    - resource:
        type: topic
        name: app
        patternType: prefix
      operations: [Read, Write]
    
    # βœ… BETTER: More specific prefix
    - resource:
        type: topic
        name: app.orders.
        patternType: prefix
      operations: [Read, Write]
  2. Unintended Topic Access

    # ❌ PROBLEM: "user" prefix matches "user-data", "users", "user-profiles", etc.
    - resource:
        type: topic
        name: user
        patternType: prefix
    
    # βœ… SOLUTION: Use specific naming with delimiters
    - resource:
        type: topic
        name: user.
        patternType: prefix  # Matches "user.profile", "user.settings", etc.
  3. Best Practices for Prefix Patterns

    • Use clear naming conventions with delimiters (., -, _)
    • Test prefix patterns in development environments
    • Regularly audit ACL permissions
    • Prefer specific topic names over broad prefixes when possible

SCRAM Secret Management

Retrieving SCRAM Credentials

# Get the generated password for a KafkaUser
kubectl get secret app-producer -o jsonpath='{.data.password}' | base64 -d

# Get the SASL JAAS configuration
kubectl get secret app-producer -o jsonpath='{.data.sasl\.jaas\.config}' | base64 -d

# Complete connection details
kubectl get secret app-producer -o yaml

Using SCRAM Credentials in Applications

Java Application Example:

# application.properties
bootstrap.servers=my-kafka-kafka-bootstrap:9092
security.protocol=SASL_SSL
sasl.mechanism=SCRAM-SHA-512
sasl.jaas.config=org.apache.kafka.common.security.scram.ScramLoginModule required \
  username="app-producer" \
  password="<password-from-secret>";
ssl.truststore.location=/path/to/truststore.jks
ssl.truststore.password=<truststore-password>

Python Application Example:

from kafka import KafkaProducer
import ssl

producer = KafkaProducer(
    bootstrap_servers=['my-kafka-kafka-bootstrap:9092'],
    security_protocol='SASL_SSL',
    sasl_mechanism='SCRAM-SHA-512',
    sasl_plain_username='app-producer',
    sasl_plain_password='<password-from-secret>',
    ssl_context=ssl.create_default_context(),
    ssl_check_hostname=False,
    ssl_cafile='/path/to/ca.crt'
)

ACL Operations Reference

Operation Description Typical Use Case
Read Read messages from topics Consumers
Write Write messages to topics Producers
Create Create topics/consumer groups Admin operations
Delete Delete topics/consumer groups Admin operations
Alter Modify topic/group configurations Admin operations
Describe Get metadata about resources Monitoring, clients
ClusterAction Cluster-level operations Admin operations
All All operations Full admin access

Least Privilege Examples

Microservice Producer (Minimal Permissions)

apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaUser
metadata:
  name: order-service-producer
  labels:
    strimzi.io/cluster: my-kafka
spec:
  authentication:
    type: scram-sha-512
  authorization:
    type: simple
    acls:
      # Only write to order events topic
      - resource:
          type: topic
          name: order.events
        operations: [Write, Describe]
      
      # Transactional ID for exactly-once semantics
      - resource:
          type: transactionalId
          name: order-service-tx
        operations: [Write, Describe]

Microservice Consumer (Minimal Permissions)

apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaUser
metadata:
  name: notification-service-consumer
  labels:
    strimzi.io/cluster: my-kafka
spec:
  authentication:
    type: scram-sha-512
  authorization:
    type: simple
    acls:
      # Only read from order events
      - resource:
          type: topic
          name: order.events
        operations: [Read, Describe]
      
      # Specific consumer group
      - resource:
          type: group
          name: notification-service
        operations: [Read]

Troubleshooting ACLs

Common ACL Issues

  1. Access Denied Errors

    # Check user's ACLs
    kubectl describe kafkauser app-producer
    
    # Verify topic exists and user has access
    kubectl exec my-kafka-dual-role-0 -c kafka -- bin/kafka-acls.sh \
      --bootstrap-server localhost:9092 \
      --list --principal User:app-producer
  2. Consumer Group Authorization Failures

    # Verify consumer group ACLs
    kubectl exec my-kafka-dual-role-0 -c kafka -- bin/kafka-consumer-groups.sh \
      --bootstrap-server localhost:9092 --list
  3. Testing ACL Permissions

    # Test producer permissions
    kubectl exec my-kafka-dual-role-0 -c kafka -- bin/kafka-console-producer.sh \
      --bootstrap-server localhost:9092 \
      --topic test-topic \
      --producer.config /tmp/client.properties
    
    # Test consumer permissions
    kubectl exec my-kafka-dual-role-0 -c kafka -- bin/kafka-console-consumer.sh \
      --bootstrap-server localhost:9092 \
      --topic test-topic \
      --consumer.config /tmp/client.properties \
      --from-beginning

References

πŸ” Certificate Management: Cert-Manager vs Strimzi CA

This section provides guidance on choosing between cert-manager and Strimzi's built-in Certificate Authority (CA) for TLS certificate management, including rotation flows and environment-specific recommendations.

Comparison Matrix

Feature Strimzi CA Cert-Manager Recommendation
Setup Complexity Simple (built-in) Moderate (external dependency) Strimzi CA for simple setups
Certificate Rotation Automatic Automatic Both support auto-rotation
External Trust Manual trust distribution Industry-standard CAs Cert-Manager for external clients
Multi-Cluster Per-cluster CA Centralized management Cert-Manager for multi-cluster
Compliance Self-signed certificates Trusted CA certificates Cert-Manager for compliance
Operational Overhead Low Medium Strimzi CA for internal-only
Client Configuration Custom truststore required Standard CA trust Cert-Manager for ease of use

Strimzi CA Configuration (Default)

Basic Strimzi CA Setup

kafkaCluster:
  listeners:
    - name: "tls"
      port: 9093
      type: internal
      tls: true
      authentication:
        type: tls
      # Uses Strimzi-generated CA by default

  # Optional: Customize CA certificate validity
  clusterCa:
    renewalDays: 30        # Renew 30 days before expiration
    validityDays: 365      # Certificate valid for 1 year
    generateCertificateAuthority: true
  
  clientsCa:
    renewalDays: 30
    validityDays: 365
    generateCertificateAuthority: true

Strimzi CA Certificate Rotation

Automatic Rotation (Recommended):

kafkaCluster:
  clusterCa:
    generateCertificateAuthority: true
    renewalDays: 30
    validityDays: 365
    # Automatic rotation enabled by default
  
  clientsCa:
    generateCertificateAuthority: true
    renewalDays: 30
    validityDays: 365

Manual Rotation Process:

# 1. Check current certificate status
kubectl get secret my-kafka-cluster-ca-cert -o yaml

# 2. Trigger manual rotation (if needed)
kubectl annotate kafka my-kafka strimzi.io/force-renew=true

# 3. Monitor rotation progress
kubectl get kafka my-kafka -o yaml | grep -A 10 status

# 4. Verify new certificates
kubectl get secret my-kafka-cluster-ca-cert -o jsonpath='{.data.ca\.crt}' | base64 -d | openssl x509 -text -noout

Cert-Manager Integration

Prerequisites

# Install cert-manager
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.13.0/cert-manager.yaml

# Verify installation
kubectl get pods -n cert-manager

ClusterIssuer Configuration

Let's Encrypt Production:

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: admin@example.com
    privateKeySecretRef:
      name: letsencrypt-prod
    solvers:
    - http01:
        ingress:
          class: nginx
    - dns01:
        route53:
          region: us-east-1
          accessKeyID: AKIAIOSFODNN7EXAMPLE
          secretAccessKeySecretRef:
            name: route53-credentials
            key: secret-access-key

Private CA Issuer:

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: private-ca-issuer
spec:
  ca:
    secretName: private-ca-key-pair

Kafka with Cert-Manager Certificates

kafkaCluster:
  listeners:
    - name: "external"
      port: 9095
      type: ingress
      tls: true
      authentication:
        type: tls
      configuration:
        class: "nginx"
        bootstrap:
          host: "kafka.example.com"
          annotations:
            cert-manager.io/cluster-issuer: "letsencrypt-prod"
          tls:
            enabled: true
            secretName: "kafka-bootstrap-tls"  # Managed by cert-manager
        brokers:
          generateDynamic: true
          maxBrokers: 3
          hostPattern: "kafka-{broker}.example.com"
          annotations:
            cert-manager.io/cluster-issuer: "letsencrypt-prod"
          tls:
            enabled: true
            secretName: "kafka-brokers-tls"   # Managed by cert-manager

  # Disable Strimzi CA for external listeners
  clusterCa:
    generateCertificateAuthority: false
    secretName: "external-cluster-ca"  # Provide your own CA

Certificate Resources

Bootstrap Certificate:

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: kafka-bootstrap-cert
  namespace: kafka-system
spec:
  secretName: kafka-bootstrap-tls
  issuerRef:
    name: letsencrypt-prod
    kind: ClusterIssuer
  dnsNames:
  - kafka.example.com
  - kafka-bootstrap.example.com

Broker Certificates (Template):

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: kafka-broker-0-cert
  namespace: kafka-system
spec:
  secretName: kafka-broker-0-tls
  issuerRef:
    name: letsencrypt-prod
    kind: ClusterIssuer
  dnsNames:
  - kafka-0.example.com
  - kafka-broker-0.example.com

Certificate Rotation Flows

Strimzi CA Rotation Flow

graph TD
    A[Certificate Expiry Approaching] --> B[Strimzi Operator Detects]
    B --> C[Generate New CA Certificate]
    C --> D[Update Cluster CA Secret]
    D --> E[Rolling Restart Brokers]
    E --> F[Update Client Certificates]
    F --> G[Rotation Complete]
    
    G --> H[Clients Auto-Reconnect]
    H --> I[Verify New Certificates]
Loading

Monitoring Strimzi CA Rotation:

# Check CA certificate expiry
kubectl get secret my-kafka-cluster-ca-cert -o jsonpath='{.data.ca\.crt}' | \
  base64 -d | openssl x509 -enddate -noout

# Monitor rotation events
kubectl get events --field-selector reason=CaCertRenewed

# Check operator logs
kubectl logs deployment/strimzi-cluster-operator -n strimzi-system

Cert-Manager Rotation Flow

graph TD
    A[Certificate Expiry Approaching] --> B[Cert-Manager Detects]
    B --> C[Request New Certificate from CA]
    C --> D[Update Certificate Secret]
    D --> E[Ingress Controller Reloads]
    E --> F[Kafka Brokers Reload TLS]
    F --> G[Rotation Complete]
    
    G --> H[Clients Use New Certificates]
    H --> I[Verify Certificate Chain]
Loading

Monitoring Cert-Manager Rotation:

# Check certificate status
kubectl get certificates -n kafka-system

# Check certificate expiry
kubectl describe certificate kafka-bootstrap-cert -n kafka-system

# Monitor cert-manager logs
kubectl logs deployment/cert-manager -n cert-manager

# Check certificate events
kubectl get events --field-selector involvedObject.kind=Certificate

Environment-Specific Recommendations

Development Environment

Recommendation: Strimzi CA

kafkaCluster:
  clusterCa:
    generateCertificateAuthority: true
    validityDays: 90  # Shorter validity for dev
  clientsCa:
    generateCertificateAuthority: true
    validityDays: 90

Rationale:

  • βœ… Simple setup, no external dependencies
  • βœ… Fast iteration and testing
  • βœ… Self-contained environment
  • ❌ Requires custom truststore configuration

Staging Environment

Recommendation: Cert-Manager with Staging CA

kafkaCluster:
  listeners:
    - name: "external"
      type: ingress
      configuration:
        bootstrap:
          annotations:
            cert-manager.io/cluster-issuer: "letsencrypt-staging"

Rationale:

  • βœ… Tests production-like certificate management
  • βœ… Validates cert-manager integration
  • βœ… Uses staging CA (higher rate limits)
  • βœ… Prepares for production deployment

Production Environment

Recommendation: Cert-Manager with Production CA

kafkaCluster:
  listeners:
    - name: "external"
      type: ingress
      configuration:
        bootstrap:
          annotations:
            cert-manager.io/cluster-issuer: "letsencrypt-prod"
            # Additional production annotations
            nginx.ingress.kubernetes.io/ssl-redirect: "true"
            nginx.ingress.kubernetes.io/force-ssl-redirect: "true"

Rationale:

  • βœ… Trusted certificates for external clients
  • βœ… Compliance with security policies
  • βœ… Automatic renewal and rotation
  • βœ… Industry-standard certificate management
  • ❌ Additional operational complexity

Hybrid Approach (Recommended for Large Deployments)

kafkaCluster:
  listeners:
    # Internal listeners use Strimzi CA
    - name: "tls-internal"
      port: 9093
      type: internal
      tls: true
      # Uses Strimzi CA (default)
    
    # External listeners use cert-manager
    - name: "external"
      port: 9095
      type: ingress
      tls: true
      configuration:
        bootstrap:
          annotations:
            cert-manager.io/cluster-issuer: "letsencrypt-prod"
          tls:
            enabled: true
            secretName: "kafka-external-tls"  # Managed by cert-manager

  # Keep Strimzi CA for internal communication
  clusterCa:
    generateCertificateAuthority: true
  clientsCa:
    generateCertificateAuthority: true

Troubleshooting Certificate Issues

Common Strimzi CA Issues

  1. Certificate Expiry

    # Check certificate validity
    kubectl get secret my-kafka-cluster-ca-cert -o jsonpath='{.data.ca\.crt}' | \
      base64 -d | openssl x509 -dates -noout
    
    # Force certificate renewal
    kubectl annotate kafka my-kafka strimzi.io/force-renew=true
  2. Client Trust Issues

    # Extract CA certificate for client truststore
    kubectl get secret my-kafka-cluster-ca-cert -o jsonpath='{.data.ca\.crt}' | \
      base64 -d > kafka-ca.crt
    
    # Create Java truststore
    keytool -import -trustcacerts -alias kafka-ca -file kafka-ca.crt \
      -keystore kafka-truststore.jks -storepass changeit

Common Cert-Manager Issues

  1. Certificate Not Issued

    # Check certificate status
    kubectl describe certificate kafka-bootstrap-cert
    
    # Check cert-manager logs
    kubectl logs deployment/cert-manager -n cert-manager
    
    # Check ACME challenge status
    kubectl get challenges
  2. DNS Validation Failures

    # Check DNS propagation
    dig kafka.example.com
    
    # Verify DNS01 solver configuration
    kubectl describe clusterissuer letsencrypt-prod

Best Practices

Security Best Practices

  1. Certificate Rotation

    • Set appropriate renewal periods (30 days before expiry)
    • Monitor certificate expiry dates
    • Test rotation procedures regularly
  2. Key Management

    • Protect private keys with appropriate RBAC
    • Use separate CAs for different environments
    • Implement certificate pinning where appropriate
  3. Monitoring

    • Set up alerts for certificate expiry
    • Monitor certificate rotation events
    • Validate certificate chains regularly

Operational Best Practices

  1. Documentation

    • Document certificate management procedures
    • Maintain certificate inventory
    • Document client configuration requirements
  2. Testing

    • Test certificate rotation in non-production
    • Validate client reconnection behavior
    • Test certificate validation failures
  3. Automation

    • Automate certificate deployment
    • Implement certificate monitoring
    • Automate client truststore updates

References

🌍 Multi-Environment Deployment

Environment-Specific Configurations

The chart includes pre-configured values for different environments:

Non-Production (values-nonprod.yaml)

  • 3 replicas
  • 4Gi memory per broker
  • 100Gi storage per broker
  • 1-hour log retention
  • HPA disabled
  • Basic monitoring

Staging (values-staging.yaml)

  • 3 replicas (HPA: 3-8)
  • 6Gi memory per broker
  • 200Gi storage per broker
  • 24-hour log retention
  • HPA enabled
  • Pod anti-affinity enabled
  • Enhanced monitoring

Production (values-prod.yaml)

  • 5 replicas (HPA: 5-15)
  • 8Gi memory per broker
  • 500Gi storage per broker
  • 7-day log retention
  • HPA enabled with conservative scaling
  • Pod anti-affinity enforced
  • Comprehensive monitoring
  • Multiple Connect clusters
  • Enhanced security settings

Deployment Examples

# Deploy to non-production (cluster name: kafka-nonprod)
helm install kafka-nonprod . \
  -f values-nonprod.yaml \
  -n om-kafka \
  --create-namespace

# Deploy to staging with custom overrides (cluster name: kafka-staging)
helm install kafka-staging . \
  -f values-staging.yaml \
  --set kafkaCluster.replicas=4 \
  --set hpa.maxReplicas=10 \
  -n om-kafka-staging \
  --create-namespace

# Deploy to production with additional security (cluster name: kafka-prod)
helm install kafka-prod . \
  -f values-prod.yaml \
  --set kafkaCluster.config.auto.create.topics.enable=false \
  --set kafkaCluster.authorization.type=simple \
  -n om-kafka-prod \
  --create-namespace

πŸ§ͺ Testing

Helm Tests

Run built-in connectivity tests:

# Test the deployment
helm test kafka-nonprod -n om-kafka

# View test logs
kubectl logs -n om-kafka kafka-nonprod-test-kafka-connection

Manual Testing

  1. Create a test producer:

    kubectl run kafka-producer -ti --image=quay.io/strimzi/kafka:0.41.0-kafka-3.9.0 --rm=true --restart=Never -- bin/kafka-console-producer.sh --bootstrap-server om-kafka-cluster-kafka-bootstrap:9092 --topic test-topic
  2. Create a test consumer:

    kubectl run kafka-consumer -ti --image=quay.io/strimzi/kafka:0.41.0-kafka-3.9.0 --rm=true --restart=Never -- bin/kafka-console-consumer.sh --bootstrap-server om-kafka-cluster-kafka-bootstrap:9092 --topic test-topic --from-beginning

πŸ“Š Monitoring

Metrics Collection

The chart automatically configures JMX Prometheus Exporter for:

  • Kafka Brokers: Core Kafka metrics, JVM metrics, and custom business metrics
  • Cruise Control: Rebalancing and optimization metrics
  • Kafka Connect: Connector performance and health metrics
  • Kafka Exporter: Additional Kafka-specific metrics

Grafana Dashboards

Import the provided Grafana dashboards:

# Import Kafka cluster health dashboard
kubectl apply -f grafana/kafka-cluster-health.json

# Import Zookeeper dashboard (if using ZooKeeper mode)
kubectl apply -f grafana/zookeeper.json

Prometheus Configuration

Add the following to your Prometheus configuration:

scrape_configs:
  - job_name: 'kafka'
    kubernetes_sd_configs:
      - role: pod
    relabel_configs:
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
        action: keep
        regex: true
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
        action: replace
        target_label: __metrics_path__
        regex: (.+)

πŸ”§ Troubleshooting

Common Issues

  1. Pods stuck in Pending state:

    • Check node affinity and tolerations
    • Verify nodegroup labels and taints
    • Ensure sufficient resources in the cluster
  2. External listeners not accessible:

    • Verify ingress controller is running
    • Check DNS resolution for broker hosts
    • Validate TLS certificates
  3. Storage issues:

    • Verify storage class exists
    • Check PVC creation and binding
    • Ensure sufficient storage quota
  4. Authentication failures:

    • Verify user secrets are created
    • Check ACL configurations
    • Validate certificate trust chains

Debug Commands

# Check Kafka cluster status
kubectl get kafka -n om-kafka

# View Kafka cluster details
kubectl describe kafka om-kafka-cluster -n om-kafka

# Check pod logs
kubectl logs -n om-kafka om-kafka-cluster-kafka-0

# View Strimzi operator logs
kubectl logs -n strimzi-operator deployment/strimzi-cluster-operator

# Check external DNS records
kubectl logs -n external-dns deployment/external-dns

Performance Tuning

For production workloads, consider these optimizations:

  1. JVM Tuning:

    jvmOptions:
      xms: "4g"
      xmx: "4g"
      additionalOptions:
        - "-XX:+UseG1GC"
        - "-XX:MaxGCPauseMillis=20"
        - "-XX:InitiatingHeapOccupancyPercent=35"
  2. Kafka Configuration:

    config:
      num.io.threads: 16
      num.network.threads: 8
      socket.send.buffer.bytes: 102400
      socket.receive.buffer.bytes: 102400
      socket.request.max.bytes: 104857600
      num.replica.fetchers: 4
  3. Storage Optimization:

    storage:
      type: jbod
      volumes:
        - type: persistent-claim
          size: 1000Gi
          class: io1  # High IOPS storage class

πŸ”„ Upgrading

Kafka Version Upgrade

  1. Update the chart values:

    kafkaCluster:
      version: "3.9.0"  # New version
  2. Apply the upgrade:

    helm upgrade kafka-nonprod . -f values-nonprod.yaml -n om-kafka
  3. Monitor the rolling update:

    kubectl get pods -n om-kafka -w

Chart Upgrade

# Upgrade to a new chart version
helm upgrade kafka-nonprod . -f values-nonprod.yaml -n om-kafka

# Rollback if needed
helm rollback kafka-nonprod 1 -n om-kafka

πŸ—‘οΈ Uninstallation

# Uninstall the Helm release
helm uninstall kafka-nonprod -n om-kafka

# Clean up persistent volumes (if needed)
kubectl delete pvc -l strimzi.io/cluster=om-kafka-cluster -n om-kafka

# Remove the namespace
kubectl delete namespace om-kafka

πŸ“š Additional Resources

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests if applicable
  5. Submit a pull request

πŸ“Š Metrics Configuration

The chart provides flexible metrics configuration through conditional ConfigMaps that are created only when needed, eliminating unnecessary overhead and keeping your values.yaml clean.

🎯 Dynamic ConfigMap Creation

Instead of hardcoding 160+ lines of metrics configuration in values.yaml, ConfigMaps are now created dynamically based on your metrics requirements:

# Optional standalone metrics configurations
kafkaMetrics:
  enabled: true                    # Create Kafka JMX metrics ConfigMap
  configMapName: kafka-metrics
  configMapKey: kafka-metrics-config.yml

cruiseControlMetrics:
  enabled: true                    # Create Cruise Control metrics ConfigMap
  configMapName: cruise-control-metrics
  configMapKey: cruise-control-metrics-config.yml

βœ… Benefits

  • 🎯 Conditional Creation: ConfigMaps are created only when enabled: true
  • πŸ“¦ Reduced Overhead: No hardcoded metrics configuration in values.yaml
  • πŸ”§ Flexible Configuration: Enable only the metrics you need
  • πŸ“Š Default Configurations: Production-ready JMX Prometheus Exporter patterns included
  • πŸš€ Easy Management: Simple enable/disable per metrics type

πŸ“ˆ Available Metrics Types

Metrics Type Purpose Default Enabled
kafkaMetrics Kafka broker JMX metrics with comprehensive patterns βœ…
cruiseControlMetrics Cruise Control rebalancing metrics βœ…

Note: Kafka Exporter has built-in metrics collection and doesn't require a separate ConfigMap.

πŸ”— Integration with Components

The metrics ConfigMaps automatically integrate with their respective components:

kafkaCluster:
  metricsConfig:
    enabled: true
    configMapName: kafka-metrics        # References kafkaMetrics ConfigMap
    configMapKey: kafka-metrics-config.yml

  cruiseControl:
    metricsConfig:
      enabled: true
      configMapName: cruise-control-metrics  # References cruiseControlMetrics ConfigMap
      configMapKey: cruise-control-metrics-config.yml

πŸ”„ CI/CD & Quality Assurance

This chart maintains high quality standards through comprehensive automated testing and security scanning:

πŸ›‘οΈ Security Analysis

  • CodeQL Analysis: Automated security vulnerability scanning
  • Checkov Security Scan: Infrastructure-as-Code security best practices validation
  • SARIF Integration: Security findings uploaded to GitHub Security tab

πŸ§ͺ Automated Testing

  • Template Validation: All Helm templates validated across multiple configurations
  • Dependency Management: Automated Strimzi operator dependency updates
  • Multi-Scenario Testing:
    • Default configuration rendering
    • Rack awareness enabled scenarios
    • Node selector and affinity configurations
    • Complex scheduling combinations

πŸ“¦ Release Management

  • Automated Versioning: Semantic versioning with automated bumps
  • Chart Publishing: Automated chart packaging and publishing to GitHub Packages
  • Release Notes: Auto-generated release documentation

πŸ“Š Quality Metrics

  • Template Coverage: 100% of templates tested in CI
  • Security Rating: A+ security score with zero known vulnerabilities
  • Maintainability: Clean, well-documented, and modular code structure

All workflows run on every pull request and main branch push, ensuring consistent quality and security standards.

πŸ“„ License

This Helm chart is licensed under the Apache License 2.0. See the LICENSE file for details.

πŸ†˜ Support

For issues and questions:

  1. Check the troubleshooting section
  2. Review Strimzi documentation
  3. Open an issue in the repository
  4. Contact the maintainers

Note: This chart is designed for production use with comprehensive configuration options. Always test in a non-production environment before deploying to production.

About

Deployments of Kafka, Kafka Connect using TLS encryption, authentication and authorization

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors 2

  •  
  •  

Languages