Skip to content

Infinite Reconciliation Loop Due to ActionPlan False Positive on Resource Quantity String Representation #1904

@DCkQ6

Description

@DCkQ6

Summary

The operator enters an infinite reconciliation loop when ClickHouseKeeperInstallation resources use non-normalized memory limit values (e.g., 2048Mi). The root cause is that ActionPlan comparison incorrectly flags a diff in the resource.Quantity.s field (string representation) between the ancestor CR and the newly normalized CR, even though the quantities are semantically equivalent (2048Mi == 2Gi). While StatefulSet comparison correctly identifies objects as equal, ActionPlan triggers reconciliation before StatefulSet reconciliation is attempted. Using normalized formats (e.g., 2Gi) avoids the issue.

Description

Problem

When a ClickHouseKeeperInstallation resource specifies memory limits using values like 2048Mi, the operator enters a reconciliation loop where it continuously attempts to update the StatefulSet. This creates an infinite loop when resource keeps InProgress status.

Root Cause

The issue stems from inconsistent normalization of Kubernetes resource.Quantity values between:

  1. Ancestor CR: The previously normalized CR stored in status has resource quantities with their original format (e.g., 2048Mi with .s field populated)
  2. New normalized CR: The newly normalized CR has resource quantities normalized differently (e.g., 2Gi or with .s field empty/different)

The reconciliation loop is triggered by the ActionPlan comparison (pkg/apis/clickhouse.altinity.com/v1/action_plan.go), which uses messagediff.DeepDiff() to compare the ancestor CR spec with the new normalized CR spec. The resource.Quantity type has an internal .s field (string representation) that differs between the two CRs even though the quantities are semantically equivalent (e.g., 2048Mi == 2Gi).

Evidence from logs:

  1. ActionPlan detects false positive diff:
I0112 14:51:18.312544 worker-reconciler-chk.go:55] ActionPlan start buildCR
Diff start -------------------------
modified spec items num: 1
diff item [0]:'.Templates.PodTemplates[0].Spec.Containers[0].Resources.Limits["memory"].s' = '""'
Diff end -------------------------
I0112 14:51:18.312563 worker-reconciler-chk.go:59] ActionPlan has actions - continue reconcile
  1. StatefulSet comparison correctly identifies objects as equal (multiple occurrences in each loop):
I0112 14:51:17.981362 statefulset-reconciler.go:103] Have StatefulSet available, try to perform label-based comparison for sts: piwikpro-clickhouse/azure-clickhouse-keeper-default-0-0
I0112 14:51:17.981391 object-status.go:47] GetObjectStatusFromMetas():cur and new objects are equal based on object version label. Update of the object is not required. Object: piwikpro-clickhouse/azure-clickhouse-keeper-default-0-0

I0112 14:51:18.022212 statefulset-reconciler.go:103] Have StatefulSet available, try to perform label-based comparison for sts: piwikpro-clickhouse/azure-clickhouse-keeper-default-0-0
I0112 14:51:18.022242 object-status.go:47] GetObjectStatusFromMetas():cur and new objects are equal based on object version label. Update of the object is not required. Object: piwikpro-clickhouse/azure-clickhouse-keeper-default-0-0

I0112 14:51:18.083866 statefulset-reconciler.go:103] Have StatefulSet available, try to perform label-based comparison for sts: piwikpro-clickhouse/azure-clickhouse-keeper-default-0-1
I0112 14:51:18.083895 object-status.go:47] GetObjectStatusFromMetas():cur and new objects are equal based on object version label. Update of the object is not required. Object: piwikpro-clickhouse/azure-clickhouse-keeper-default-0-1

I0112 14:51:18.162545 statefulset-reconciler.go:103] Have StatefulSet available, try to perform label-based comparison for sts: piwikpro-clickhouse/azure-clickhouse-keeper-default-0-2
I0112 14:51:18.162575 object-status.go:47] GetObjectStatusFromMetas():cur and new objects are equal based on object version label. Update of the object is not required. Object: piwikpro-clickhouse/azure-clickhouse-keeper-default-0-2

I0112 14:51:18.297583 statefulset-reconciler.go:103] Have StatefulSet available, try to perform label-based comparison for sts: piwikpro-clickhouse/azure-clickhouse-keeper-default-0-0
I0112 14:51:18.297612 object-status.go:47] GetObjectStatusFromMetas():cur and new objects are equal based on object version label. Update of the object is not required. Object: piwikpro-clickhouse/azure-clickhouse-keeper-default-0-0

This clearly demonstrates that:

  • StatefulSet comparison works correctly: All StatefulSet comparisons show "cur and new objects are equal", meaning the StatefulSet reconciliation logic correctly identifies that no update is needed.
  • ActionPlan comparison is the problem: The ActionPlan diff detects a change in the .s field of resource.Quantity, triggering reconciliation even though the StatefulSets are semantically equal.
  • Reconciliation loop: Despite StatefulSets being equal, the ActionPlan keeps detecting changes, causing continuous reconciliation attempts.

Affected Code Paths

  1. Resource Normalization: pkg/model/common/normalizer/templates/pod.go:normalizeResourceList() - Normalizes resource quantities during template processing
  2. ActionPlan Comparison: pkg/apis/clickhouse.altinity.com/v1/action_plan.go:MakeActionPlan() - Compares ancestor CR with new normalized CR using messagediff.DeepDiff()
  3. Path Exclusion Logic: pkg/apis/clickhouse.altinity.com/v1/action_plan.go:isExcludedPathSegment() - Determines which diff paths to ignore (currently doesn't exclude resource quantity .s field)
  4. Reconciliation Trigger: pkg/controller/chk/worker-reconciler-chk.go:buildCR() - Builds ActionPlan and triggers reconciliation if changes detected

Steps to Reproduce

  1. Create a ClickHouseKeeperInstallation with memory limits specified in non-normalized format:

    apiVersion: "clickhouse-keeper.altinity.com/v1"
    kind: "ClickHouseKeeperInstallation"
    metadata:
      name: test-keeper
    spec:
      templates:
        podTemplates:
          - name: default
            spec:
              containers:
                - name: clickhouse-keeper
                  resources:
                    limits:
                      memory: "2048Mi"  # Non-normalized format
  2. Apply the resource and observe the operator logs

  3. The operator will continuously attempt to reconcile the StatefulSet

  4. Check StatefulSet generation - it remains unchanged despite update attempts

Expected Behavior

The operator should recognize that 2048Mi and 2Gi are equivalent values and not attempt to update the StatefulSet when only the format differs.

Actual Behavior

The operator continuously attempts to update the StatefulSet, creating a reconciliation loop.

Workaround

Use normalized resource quantity formats (e.g., 2Gi instead of 2048Mi):

resources:
  limits:
    memory: "2Gi"  # Normalized format

Environment

  • Operator Version: 0.25.6
  • Kubernetes Version: 1.32.6
  • ClickHouse Keeper Version: 24.3.5.48

Additional Context

  • The normalization function normalizeResourceList() in pkg/model/common/normalizer/templates/pod.go uses JSON marshal/unmarshal to normalize quantities, which may produce different .s field values than the original CR
  • The resource.Quantity type has internal fields (.s, .i, .d) that can differ even when quantities are semantically equal
  • The ActionPlan uses messagediff.DeepDiff() which performs deep comparison including all internal fields
  • The issue affects both ClickHouseKeeperInstallation and potentially ClickHouseInstallation resources
  • Note: StatefulSet comparison works correctly (logs show "cur and new objects are equal"), but the ActionPlan comparison triggers reconciliation before StatefulSet reconciliation is even attempted

Related Files

  • pkg/model/common/normalizer/templates/pod.go - Resource normalization
  • pkg/controller/common/statefulset/statefulset-reconciler.go - StatefulSet reconciliation
  • pkg/model/common/tags/labeler/auxiliary.go - Version fingerprint generation
  • pkg/util/fingerprint.go - Fingerprint generation
  • pkg/util/hash.go - Serialization for fingerprinting

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions