Skip to content

gorizond/capi-vip-allocator

Repository files navigation

CAPI VIP Allocator

Automatic Virtual IP allocation for Cluster API clusters using reconcile controller and custom ClusterClass variables.

Important: This operator only allocates IP addresses from IPAM pools. To install the VIP on control plane nodes, you also need kube-vip or similar solution (see kube-vip integration).

Features

  • Automatic allocation - Free IP from IPAM pool based on ClusterClass labels
  • Automatic cleanup - IP released when Cluster is deleted via ownerReferences
  • Zero configuration - User doesn't specify VIP in Cluster manifest
  • No race conditions - Reconcile controller runs before topology reconcile
  • Custom variable - VIP available in ClusterClass as {{ .clusterVip }}
  • Production ready - TLS, health checks, leader election

How it works (v0.5.0)

Architecture: Reconcile Controller + Custom Variable + kube-vip

┌───────────────────────────────────────────────────────────┐
│ MANAGEMENT CLUSTER                                        │
│                                                            │
│ 1. User creates Cluster (no VIP specified)               │
│    Cluster.spec.controlPlaneEndpoint.host = ""           │
│    Cluster.spec.topology.variables[clusterVip] = ""      │
│                                                            │
│ 2. Reconcile Controller (capi-vip-allocator)             │
│    ├─ Finds GlobalInClusterIPPool (by labels)            │
│    ├─ Creates IPAddressClaim                              │
│    ├─ Waits for IPAM to allocate free IP                 │
│    └─ Patches Cluster:                                    │
│       ├─ spec.controlPlaneEndpoint.host = "10.2.0.21"    │
│       └─ spec.topology.variables[clusterVip] = "10.2.0.21"│
│                                                            │
│ 3. Topology Controller (CAPI)                             │
│    ├─ Reads clusterVip variable                           │
│    ├─ Applies ClusterClass inline patches                │
│    └─ Creates InfrastructureCluster & ControlPlane       │
│                                                            │
└───────────────────────────────────────────────────────────┘
                          ↓
┌───────────────────────────────────────────────────────────┐
│ WORKLOAD CLUSTER                                          │
│                                                            │
│ 4. kube-vip DaemonSet                                     │
│    ├─ Deployed via ClusterClass patch (HelmChart)        │
│    ├─ Reads address from HelmChart: {{ .clusterVip }}    │
│    ├─ INSTALLS VIP on control plane node interface       │
│    └─ Provides HA for multi control plane clusters       │
│                                                            │
└───────────────────────────────────────────────────────────┘

Result: API server accessible via VIP, Rancher auto-import works! ✅

Two components work together:

  • capi-vip-allocator: Allocates free IP from IPAM pool
  • kube-vip: Installs VIP on control plane node interface

Quick Start

Prerequisites

  1. CAPI with ClusterTopology feature enabled (enabled by default in CAPI v1.5+)

  2. IPAM provider installed (e.g., in-cluster IPAM)

# Install in-cluster IPAM provider
clusterctl init --ipam in-cluster

# Or via CAPIProvider (Rancher Turtles)
kubectl apply -f - <<EOF
apiVersion: turtles-capi.cattle.io/v1alpha1
kind: CAPIProvider
metadata:
  name: ipam-in-cluster
  namespace: capi-system
spec:
  type: ipam
  version: v0.1.0
EOF
  1. cert-manager installed (optional, only needed if Runtime Extensions are enabled)

Note: v0.5.0+ uses reconcile controller (not Runtime Extensions) by default. cert-manager is only needed for TLS certificates if you enable Runtime Extensions with --enable-runtime-extension=true.

Installation

Using CAPIProvider (Rancher Turtles):

apiVersion: turtles-capi.cattle.io/v1alpha1
kind: CAPIProvider
metadata:
  name: capi-vip-allocator
  namespace: capi-system
spec:
  type: addon
  version: v0.5.0
  fetchConfig:
    url: https://github.com/gorizond/capi-vip-allocator/releases/download/v0.5.0/capi-vip-allocator.yaml

Or directly:

kubectl apply -f https://github.com/gorizond/capi-vip-allocator/releases/download/v0.5.0/capi-vip-allocator.yaml

Note: v0.5.0 uses reconcile controller (not Runtime Extensions) by default.

Create IP Pool

apiVersion: ipam.cluster.x-k8s.io/v1alpha2
kind: GlobalInClusterIPPool
metadata:
  name: control-plane-vip-pool
  labels:
    vip.capi.gorizond.io/cluster-class: my-cluster-class
    vip.capi.gorizond.io/role: control-plane
spec:
  addresses:
    - "10.0.0.10-10.0.0.20"
  gateway: "10.0.0.1"
  prefix: 24

Label matching:

  • Supports exact match: cluster-class: my-cluster-class
  • Supports comma-separated values: cluster-class: "class1,class2,class3"
  • Supports annotation-based (for long names > 63 chars): cluster-class: "true" + annotation
  • Supports both labels: role: "control-plane,ingress" (shared pool)

Example with multiple cluster classes (label-based):

apiVersion: ipam.cluster.x-k8s.io/v1alpha2
kind: GlobalInClusterIPPool
metadata:
  name: shared-vip-pool
  labels:
    # This pool works for multiple cluster classes
    vip.capi.gorizond.io/cluster-class: "rke2-proxmox,rke2-vsphere,k3s-proxmox"
    # And for multiple roles
    vip.capi.gorizond.io/role: "control-plane,ingress"
spec:
  addresses:
    - "10.0.0.10-10.0.0.50"
  gateway: "10.0.0.1"
  prefix: 24

Example with annotation (for long cluster class names):

apiVersion: ipam.cluster.x-k8s.io/v1alpha2
kind: GlobalInClusterIPPool
metadata:
  name: long-names-vip-pool
  labels:
    # Set to "true" to enable annotation-based matching
    vip.capi.gorizond.io/cluster-class: "true"
    vip.capi.gorizond.io/role: "control-plane"
  annotations:
    # Annotation supports names longer than 63 characters
    vip.capi.gorizond.io/cluster-class: "very-long-cluster-class-name-that-exceeds-kubernetes-label-63-character-limit,another-very-long-name,rke2-proxmox-with-some-very-long-suffix-that-makes-it-too-long-for-label"
spec:
  addresses:
    - "10.0.0.10-10.0.0.20"
  gateway: "10.0.0.1"
  prefix: 24

Why annotation-based matching?

  • Kubernetes labels are limited to 63 characters
  • Annotations have no such limit (up to 256KB)
  • Useful for auto-generated long ClusterClass names
  • Maintains backward compatibility (label-based matching still works)

Create Cluster

apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
  name: my-cluster
spec:
  # DO NOT specify controlPlaneEndpoint! It will be set automatically
  topology:
    class: my-cluster-class
    version: v1.31.0
    controlPlane:
      replicas: 3
    workers:
      machineDeployments:
        - class: default-worker
          name: workers
          replicas: 2
    # DO NOT specify clusterVip variable! It will be added automatically
    variables:
      - name: cni
        value: calico

Done! Within 5-10 seconds:

  • Cluster.spec.controlPlaneEndpoint.host will be set (e.g., 10.0.0.15)
  • Cluster.spec.topology.variables[clusterVip] will be added (e.g., 10.0.0.15)

Verification

# Check control plane VIP allocation (should appear in ~5-10 seconds)
kubectl get cluster my-cluster -o jsonpath='{.spec.controlPlaneEndpoint.host}'
# Output: 10.0.0.15

# Check ingress VIP annotation (if enabled)
kubectl get cluster my-cluster -o jsonpath='{.metadata.annotations.vip\.capi\.gorizond\.io/ingress-vip}'
# Output: 10.0.0.101

# Check ingress VIP label (also set automatically)
kubectl get cluster my-cluster -o jsonpath='{.metadata.labels.vip\.capi\.gorizond\.io/ingress-vip}'
# Output: 10.0.0.101

# Check IPAddressClaims
kubectl get ipaddressclaim -n YOUR_NAMESPACE
# vip-cp-my-cluster        (control plane VIP)
# vip-ingress-my-cluster   (ingress VIP)

# Check operator logs
kubectl logs -n capi-system -l control-plane=capi-vip-allocator-controller-manager -f

Configuration

IP Pool Selection

The operator finds pools using labels (with optional annotation for long names):

Standard mode (label-based):

  • vip.capi.gorizond.io/cluster-class: <clusterClassName> (exact or comma-separated)
  • vip.capi.gorizond.io/role: control-plane

Annotation mode (for long cluster class names > 63 chars):

  • vip.capi.gorizond.io/cluster-class: "true" (label)
  • vip.capi.gorizond.io/cluster-class: <clusterClassName1>,<clusterClassName2>,... (annotation)
  • vip.capi.gorizond.io/role: control-plane (label)

Manual VIP Override

Specify VIP manually to skip automatic allocation:

spec:
  controlPlaneEndpoint:
    host: "10.0.0.100"
    port: 6443

Configuration Options

Deployment args (v0.5.0+):

  • --enable-reconciler=true - Enable reconcile controller (default: true, REQUIRED for VIP allocation)
  • --enable-runtime-extension=false - Enable Runtime Extension mode (default: false, deprecated)
  • --runtime-extension-port=9443 - Runtime Extension server port
  • --leader-elect - Enable leader election
  • --default-port=6443 - Default control plane port

Important: v0.5.0 uses reconcile controller architecture. Runtime Extensions are deprecated and disabled by default.

ClusterClass Integration

Step 1: Define clusterVip variable

The reconcile controller writes VIP to a custom variable clusterVip. You MUST define it in your ClusterClass:

apiVersion: cluster.x-k8s.io/v1beta1
kind: ClusterClass
metadata:
  name: my-cluster-class
spec:
  variables:
    # REQUIRED: Define clusterVip variable for reconcile controller
    - name: clusterVip
      required: false
      schema:
        openAPIV3Schema:
          default: ""
          description: "Control plane VIP address (automatically set by capi-vip-allocator)"
          type: string
    
    # Your other variables...
    - name: cni
      schema:
        openAPIV3Schema:
          type: string
          default: calico

Step 2: Use {{ .clusterVip }} in patches

Use the custom variable in ClusterClass patches to propagate VIP to infrastructure resources:

Patch InfrastructureCluster

patches:
  - name: set-vip-on-infrastructure-cluster
    definitions:
      - jsonPatches:
          - op: replace
            path: /spec/template/spec/controlPlaneEndpoint/host
            valueFrom:
              template: "{{ .clusterVip }}"  # ✅ Uses custom variable
        selector:
          apiVersion: infrastructure.cluster.x-k8s.io/v1alpha1
          kind: ProxmoxClusterTemplate  # or your provider
          matchResources:
            infrastructureCluster: true

Patch ControlPlane (for TLS SANs and registration)

  - name: set-control-plane-tls-san
    definitions:
      - jsonPatches:
          - op: add
            path: /spec/template/spec/serverConfig/tlsSAN/-
            valueFrom:
              template: "{{ .clusterVip }}"  # ✅ Uses custom variable
        selector:
          apiVersion: controlplane.cluster.x-k8s.io/v1beta1
          kind: RKE2ControlPlaneTemplate

  - name: set-control-plane-registration
    definitions:
      - jsonPatches:
          - op: replace
            path: /spec/template/spec/registrationAddress
            valueFrom:
              template: "{{ .clusterVip }}"  # ✅ Uses custom variable
        selector:
          apiVersion: controlplane.cluster.x-k8s.io/v1beta1
          kind: RKE2ControlPlaneTemplate

Step 3: kube-vip Integration

Critical: capi-vip-allocator only allocates IP addresses. You need kube-vip to actually install the VIP on control plane nodes!

Add kube-vip HelmChart via ClusterClass patch:

patches:
  - name: install-kube-vip
    definitions:
      - jsonPatches:
          - op: add
            path: /spec/template/spec/files/-
            valueFrom:
              template: |
                path: /var/lib/rancher/rke2/server/manifests/kube-vip-apiserver.yaml
                permissions: "0644"
                owner: root:root
                content: |
                  apiVersion: helm.cattle.io/v1
                  kind: HelmChart
                  metadata:
                    name: kube-vip-apiserver
                    namespace: kube-system
                  spec:
                    version: 0.8.2
                    chart: kube-vip
                    repo: https://kube-vip.github.io/helm-charts
                    bootstrap: true
                    valuesContent: |-
                      nameOverride: kube-vip-apiserver
                      config:
                        address: '{{ .clusterVip }}'  # ✅ Uses allocated VIP
                      env:
                        vip_interface: ""  # Auto-detect
                        vip_arp: "true"
                        lb_enable: "false"
                        cp_enable: "true"
                        svc_enable: "false"
                        vip_leaderelection: "true"
                      nodeSelector:
                        node-role.kubernetes.io/control-plane: "true"
                      tolerations:
                        - key: "node-role.kubernetes.io/control-plane"
                          operator: "Exists"
                          effect: "NoSchedule"
        selector:
          apiVersion: controlplane.cluster.x-k8s.io/v1beta1
          kind: RKE2ControlPlaneTemplate  # or KubeadmControlPlaneTemplate
          matchResources:
            controlPlane: true

What this does:

  1. Creates HelmChart manifest in /var/lib/rancher/rke2/server/manifests/
  2. RKE2 (or kubeadm) applies it during bootstrap
  3. kube-vip DaemonSet starts on control plane nodes
  4. kube-vip installs VIP on node network interface
  5. API server becomes accessible via VIP ✅

Complete ClusterClass Example

apiVersion: cluster.x-k8s.io/v1beta1
kind: ClusterClass
metadata:
  name: rke2-proxmox-class
spec:
  # 1. Define clusterVip variable (REQUIRED!)
  variables:
    - name: clusterVip
      required: false
      schema:
        openAPIV3Schema:
          default: ""
          description: "Control plane VIP (auto-allocated by capi-vip-allocator)"
          type: string
  
  # 2. Use {{ .clusterVip }} in patches
  patches:
    # Patch InfrastructureCluster
    - name: set-vip-on-proxmox-cluster
      definitions:
        - jsonPatches:
            - op: replace
              path: /spec/template/spec/controlPlaneEndpoint/host
              valueFrom:
                template: "{{ .clusterVip }}"
          selector:
            apiVersion: infrastructure.cluster.x-k8s.io/v1alpha1
            kind: ProxmoxClusterTemplate
            matchResources:
              infrastructureCluster: true
    
    # Patch ControlPlane for TLS SANs
    - name: set-rke2-tlssan
      definitions:
        - jsonPatches:
            - op: add
              path: /spec/template/spec/serverConfig/tlsSAN/-
              valueFrom:
                template: "{{ .clusterVip }}"
          selector:
            apiVersion: controlplane.cluster.x-k8s.io/v1beta1
            kind: RKE2ControlPlaneTemplate
            matchResources:
              controlPlane: true
    
    # Install kube-vip (REQUIRED to install VIP on node!)
    - name: install-kube-vip
      definitions:
        - jsonPatches:
            - op: add
              path: /spec/template/spec/files/-
              valueFrom:
                template: |
                  path: /var/lib/rancher/rke2/server/manifests/kube-vip-apiserver.yaml
                  permissions: "0644"
                  content: |
                    apiVersion: helm.cattle.io/v1
                    kind: HelmChart
                    metadata:
                      name: kube-vip-apiserver
                      namespace: kube-system
                    spec:
                      version: 0.8.2
                      chart: kube-vip
                      repo: https://kube-vip.github.io/helm-charts
                      bootstrap: true
                      valuesContent: |-
                        config:
                          address: '{{ .clusterVip }}'
                        env:
                          vip_interface: ""
                          vip_arp: "true"
                          cp_enable: "true"
          selector:
            apiVersion: controlplane.cluster.x-k8s.io/v1beta1
            kind: RKE2ControlPlaneTemplate
            matchResources:
              controlPlane: true

Why Custom Variable Instead of Builtin?

❌ Cannot use {{ .builtin.cluster.spec.controlPlaneEndpoint.host }}:

  • This builtin variable doesn't exist in CAPI

❌ Cannot use {{ .builtin.controlPlane.endpoint.host }}:

  • Circular dependency: ControlPlane object is created AFTER InfrastructureCluster
  • Value is empty during InfrastructureCluster creation

✅ Custom variable {{ .clusterVip }} works:

  • Reconcile controller sets it BEFORE topology reconcile
  • Available immediately for all patches
  • No circular dependency!

Troubleshooting

VIP not allocated

Check the following:

  1. Operator is running:

    kubectl get pods -n capi-system -l control-plane=capi-vip-allocator-controller-manager
  2. IP pool exists with correct labels:

    kubectl get globalinclusterippool -l vip.capi.gorizond.io/cluster-class=YOUR_CLASS
  3. Pool has free IPs:

    kubectl get globalinclusterippool POOL_NAME -o jsonpath='{.status.ipAddresses}'
    # Should show: {"free":9,"total":11,"used":2}
  4. Check operator logs:

    kubectl logs -n capi-system -l control-plane=capi-vip-allocator-controller-manager -f
    
    # Should see:
    # INFO controllers.Cluster controlPlaneEndpoint not set, controller will allocate VIP
    # INFO controllers.Cluster control-plane VIP assigned ip=10.0.0.15
  5. Check IPAddressClaim:

    kubectl get ipaddressclaim -n YOUR_NAMESPACE
    kubectl get ipaddress -n YOUR_NAMESPACE

ClusterClass missing clusterVip variable

Error: ClusterClass variable 'clusterVip' not found

Solution: Add clusterVip variable to ClusterClass (see Step 1).

InfrastructureCluster validation error

Error: failed to create ProxmoxCluster.infrastructure.cluster.x-k8s.io: 
FieldValueInvalid: spec.controlplaneEndpoint: 
Invalid value: "<no value>": provided endpoint address is not a valid IP or FQDN

Cause: ClusterClass patch uses wrong variable or builtin.

Solution: Use {{ .clusterVip }} in your patch (see Step 2).

VIP allocated but not accessible

# Check if VIP is set in Cluster
kubectl get cluster my-cluster -o jsonpath='{.spec.controlPlaneEndpoint.host}'
# Returns: 10.0.0.15 ✅

# But API not accessible via VIP
curl -k https://10.0.0.15:6443/version
# Connection refused ❌

Cause: kube-vip is not installed in the workload cluster.

Solution: Add kube-vip HelmChart patch to ClusterClass (see kube-vip integration).

Verify kube-vip is running:

# SSH to control plane node
kubectl --kubeconfig /etc/rancher/rke2/rke2.yaml get pods -n kube-system | grep kube-vip
# Should show: kube-vip-apiserver-xxxxx  1/1  Running

# Check VIP is installed on interface
ip addr show ens18
# Should show: inet 10.0.0.15/32 scope global ens18

Architecture (v0.5.0)

Components

  • Reconcile Controller - Watches Cluster resources with topology, allocates VIP before topology reconcile
  • IPAM Integration - Creates/manages IPAddressClaim resources
  • Custom Variable - Writes VIP to Cluster.spec.topology.variables[clusterVip]
  • ownerReferences - Automatic cleanup when Cluster is deleted
  • Runtime Extension (optional, deprecated) - Kept for backward compatibility

Resource Flow

User creates Cluster
  ↓
Reconcile Controller watches
  ├─ Finds GlobalInClusterIPPool (by ClusterClass labels)
  ├─ Creates IPAddressClaim (with ownerReference)
  ├─ Waits for IPAM to allocate IPAddress
  └─ Patches Cluster:
     ├─ spec.controlPlaneEndpoint.host = VIP
     └─ spec.topology.variables[clusterVip] = VIP
  ↓
Topology Controller reconciles
  ├─ Reads clusterVip variable
  ├─ Applies ClusterClass inline patches
  └─ Creates InfrastructureCluster with VIP
  ↓
ControlPlane bootstrap
  ├─ Applies HelmChart manifest (kube-vip)
  └─ kube-vip installs VIP on node interface
  ↓
API server accessible via VIP ✅

Why v0.5.0 Architecture?

v0.2.x - v0.4.x attempts failed:

  • v0.2.x: Reconcile controller ran async → race condition
  • v0.3.x: BeforeClusterCreate hook → cannot modify Cluster object
  • v0.4.x: GeneratePatches external patch → limited to spec.template.spec fields
  • Builtin variables: Circular dependency (ControlPlane created after InfrastructureCluster)

✅ v0.5.0 solution:

  • Reconcile controller runs before topology reconcile
  • Custom variable clusterVip breaks circular dependency
  • kube-vip handles VIP installation (not the operator)

Development

# Build binary
make build

# Run tests
make test

# Build Docker image
make docker-build TAG=dev

# Run locally (v0.5.0)
go run ./cmd/capi-vip-allocator \
  --enable-reconciler=true \
  --enable-runtime-extension=false \
  --default-port=6443

Ingress VIP Integration

New in v0.6.0: Automatic allocation of dedicated VIP for ingress/loadbalancer nodes!

Default Behavior

Ingress VIP is allocated BY DEFAULT if ClusterClass defines ingressVip variable:

apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
  name: my-cluster
  # No annotation needed! Ingress VIP allocated automatically
spec:
  topology:
    class: my-cluster-class
    workers:
      machineDeployments:
        - class: loadbalancer-workers  # ← Worker pool for ingress
          name: ingress
          replicas: 2

Result: Both Control Plane and Ingress VIPs allocated automatically! ✨

Disable Ingress VIP

To disable ingress VIP allocation:

metadata:
  annotations:
    vip.capi.gorizond.io/ingress-enabled: "false"  # ← Explicitly disable

Create Ingress IP Pool

apiVersion: ipam.cluster.x-k8s.io/v1alpha2
kind: GlobalInClusterIPPool
metadata:
  name: ingress-vip-pool
  labels:
    vip.capi.gorizond.io/cluster-class: my-cluster-class
    vip.capi.gorizond.io/role: ingress  # ← Ingress role (not control-plane!)
spec:
  addresses:
    - "10.0.0.100-10.0.0.110"  # Separate range for ingress VIPs
  gateway: "10.0.0.1"
  prefix: 24

ClusterClass Configuration

spec:
  variables:
    # Define ingressVip variable (same as clusterVip)
    - name: ingressVip
      required: false
      schema:
        openAPIV3Schema:
          default: ""
          description: "Ingress VIP (auto-allocated when annotation enabled)"
          type: string

kube-vip for Loadbalancer Nodes

Deploy kube-vip on loadbalancer workers via Fleet GitRepo:

# In proxmox-addons (or similar GitRepo)
defaultNamespace: kube-system
helm:
  version: 0.8.2
  chart: kube-vip
  repo: https://kube-vip.github.io/helm-charts
  values:
    nameOverride: kube-vip-ingress
    config:
      # Read ingressVip from Cluster annotation (recommended)
      address: ${ get .ClusterAnnotations "vip.capi.gorizond.io/ingress-vip" }
      # Alternative: use label
      # address: ${ get .ClusterLabels "vip.capi.gorizond.io/ingress-vip" }
      # Alternative: use variable
      # address: ${ .ClusterValues.Cluster.spec.topology.variables.ingressVip }
    env:
      vip_interface: ""
      vip_arp: "true"
      lb_enable: "true"   # Enable LoadBalancer
      cp_enable: "false"  # Disable control plane
      svc_enable: "true"  # Enable Services
    nodeSelector:
      workload-type: loadbalancer  # Only loadbalancer nodes

Result

After cluster creation (both VIPs allocated by default):

# Two VIPs allocated automatically:
spec:
  controlPlaneEndpoint:
    host: "10.0.0.15"  # Control plane VIP ✅
  topology:
    variables:
      - name: clusterVip
        value: "10.0.0.15"  # Control plane ✅
      - name: ingressVip
        value: "10.0.0.101"  # Ingress (allocated by default!) ✨

Two kube-vip DaemonSets running:

  1. kube-vip-apiserver on control plane nodes (10.0.0.15)
  2. kube-vip-ingress on loadbalancer workers (10.0.0.101)

Both VIPs allocated without any configuration! 🎉

Prometheus Metrics

CAPI VIP Allocator exposes Prometheus metrics on port :8080/metrics (default).

Available Metrics

Allocation Metrics

  • capi_vip_allocator_allocations_total (counter)

    • Total number of successful VIP allocations
    • Labels: role (control-plane/ingress), cluster_class
  • capi_vip_allocator_allocation_errors_total (counter)

    • Total number of VIP allocation errors
    • Labels: role, cluster_class, reason
    • Reasons: claim_creation_failed, ip_resolution_failed, cluster_patch_failed
  • capi_vip_allocator_allocation_duration_seconds (histogram)

    • Duration of VIP allocation operations
    • Labels: role, cluster_class
    • Buckets: 0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10

Pool Metrics

  • capi_vip_allocator_pools_available (gauge)

    • Number of available GlobalInClusterIPPools
    • Labels: cluster_class, role
  • capi_vip_allocator_pool_addresses_total (gauge)

    • Total IP addresses in pool
    • Labels: pool_name
  • capi_vip_allocator_pool_addresses_free (gauge)

    • Free IP addresses in pool
    • Labels: pool_name
  • capi_vip_allocator_pool_addresses_used (gauge)

    • Used IP addresses in pool
    • Labels: pool_name

Claim Metrics

  • capi_vip_allocator_claims_total (gauge)

    • Total number of active IPAddressClaims
    • Labels: role, namespace
  • capi_vip_allocator_claims_ready (gauge)

    • Number of IPAddressClaims with allocated IP
    • Labels: role, namespace
  • capi_vip_allocator_claims_pending (gauge)

    • Number of IPAddressClaims waiting for IP allocation
    • Labels: role, namespace

Reconcile Metrics

  • capi_vip_allocator_reconcile_total (counter)

    • Total number of cluster reconcile operations
    • Labels: cluster_class, result
    • Results: success, error, skipped, requeued
  • capi_vip_allocator_reconcile_duration_seconds (histogram)

    • Duration of cluster reconcile operations
    • Labels: cluster_class
    • Buckets: 0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10

Example Queries

Allocation rate by role

rate(capi_vip_allocator_allocations_total[5m])

Error rate by reason

rate(capi_vip_allocator_allocation_errors_total[5m])

Average allocation duration

rate(capi_vip_allocator_allocation_duration_seconds_sum[5m]) 
  / 
rate(capi_vip_allocator_allocation_duration_seconds_count[5m])

Pool utilization percentage

(capi_vip_allocator_pool_addresses_used / capi_vip_allocator_pool_addresses_total) * 100

Pending claims (SLO: should be < 5)

sum(capi_vip_allocator_claims_pending)

ServiceMonitor Example

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: capi-vip-allocator
  namespace: capi-system
spec:
  selector:
    matchLabels:
      control-plane: capi-vip-allocator-controller-manager
  endpoints:
    - port: metrics
      interval: 30s
      path: /metrics

Grafana Dashboard

Key metrics to monitor:

  1. Allocation Success Rate

    • capi_vip_allocator_allocations_total vs capi_vip_allocator_allocation_errors_total
  2. Allocation Latency (p50, p95, p99)

    • histogram_quantile(0.95, rate(capi_vip_allocator_allocation_duration_seconds_bucket[5m]))
  3. Pool Capacity

    • Free IPs per pool
    • Alerts when pool utilization > 80%
  4. Pending Claims

    • Alert when claims are pending for > 60s

Roadmap

  • Control-plane VIP allocation via reconcile controller
  • Custom variable integration with ClusterClass
  • kube-vip integration example
  • Ingress VIP support (annotation-based) ✨
  • Prometheus metrics ✅
  • Events and Conditions
  • Multi-namespace IP pools
  • Helm chart

License

Apache License 2.0

Links

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages