-
Notifications
You must be signed in to change notification settings - Fork 529
Description
Summary
The operator enters an infinite reconciliation loop when ClickHouseKeeperInstallation resources use non-normalized memory limit values (e.g., 2048Mi). The root cause is that ActionPlan comparison incorrectly flags a diff in the resource.Quantity.s field (string representation) between the ancestor CR and the newly normalized CR, even though the quantities are semantically equivalent (2048Mi == 2Gi). While StatefulSet comparison correctly identifies objects as equal, ActionPlan triggers reconciliation before StatefulSet reconciliation is attempted. Using normalized formats (e.g., 2Gi) avoids the issue.
Description
Problem
When a ClickHouseKeeperInstallation resource specifies memory limits using values like 2048Mi, the operator enters a reconciliation loop where it continuously attempts to update the StatefulSet. This creates an infinite loop when resource keeps InProgress status.
Root Cause
The issue stems from inconsistent normalization of Kubernetes resource.Quantity values between:
- Ancestor CR: The previously normalized CR stored in status has resource quantities with their original format (e.g.,
2048Miwith.sfield populated) - New normalized CR: The newly normalized CR has resource quantities normalized differently (e.g.,
2Gior with.sfield empty/different)
The reconciliation loop is triggered by the ActionPlan comparison (pkg/apis/clickhouse.altinity.com/v1/action_plan.go), which uses messagediff.DeepDiff() to compare the ancestor CR spec with the new normalized CR spec. The resource.Quantity type has an internal .s field (string representation) that differs between the two CRs even though the quantities are semantically equivalent (e.g., 2048Mi == 2Gi).
Evidence from logs:
- ActionPlan detects false positive diff:
I0112 14:51:18.312544 worker-reconciler-chk.go:55] ActionPlan start buildCR
Diff start -------------------------
modified spec items num: 1
diff item [0]:'.Templates.PodTemplates[0].Spec.Containers[0].Resources.Limits["memory"].s' = '""'
Diff end -------------------------
I0112 14:51:18.312563 worker-reconciler-chk.go:59] ActionPlan has actions - continue reconcile
- StatefulSet comparison correctly identifies objects as equal (multiple occurrences in each loop):
I0112 14:51:17.981362 statefulset-reconciler.go:103] Have StatefulSet available, try to perform label-based comparison for sts: piwikpro-clickhouse/azure-clickhouse-keeper-default-0-0
I0112 14:51:17.981391 object-status.go:47] GetObjectStatusFromMetas():cur and new objects are equal based on object version label. Update of the object is not required. Object: piwikpro-clickhouse/azure-clickhouse-keeper-default-0-0
I0112 14:51:18.022212 statefulset-reconciler.go:103] Have StatefulSet available, try to perform label-based comparison for sts: piwikpro-clickhouse/azure-clickhouse-keeper-default-0-0
I0112 14:51:18.022242 object-status.go:47] GetObjectStatusFromMetas():cur and new objects are equal based on object version label. Update of the object is not required. Object: piwikpro-clickhouse/azure-clickhouse-keeper-default-0-0
I0112 14:51:18.083866 statefulset-reconciler.go:103] Have StatefulSet available, try to perform label-based comparison for sts: piwikpro-clickhouse/azure-clickhouse-keeper-default-0-1
I0112 14:51:18.083895 object-status.go:47] GetObjectStatusFromMetas():cur and new objects are equal based on object version label. Update of the object is not required. Object: piwikpro-clickhouse/azure-clickhouse-keeper-default-0-1
I0112 14:51:18.162545 statefulset-reconciler.go:103] Have StatefulSet available, try to perform label-based comparison for sts: piwikpro-clickhouse/azure-clickhouse-keeper-default-0-2
I0112 14:51:18.162575 object-status.go:47] GetObjectStatusFromMetas():cur and new objects are equal based on object version label. Update of the object is not required. Object: piwikpro-clickhouse/azure-clickhouse-keeper-default-0-2
I0112 14:51:18.297583 statefulset-reconciler.go:103] Have StatefulSet available, try to perform label-based comparison for sts: piwikpro-clickhouse/azure-clickhouse-keeper-default-0-0
I0112 14:51:18.297612 object-status.go:47] GetObjectStatusFromMetas():cur and new objects are equal based on object version label. Update of the object is not required. Object: piwikpro-clickhouse/azure-clickhouse-keeper-default-0-0
This clearly demonstrates that:
- StatefulSet comparison works correctly: All StatefulSet comparisons show "cur and new objects are equal", meaning the StatefulSet reconciliation logic correctly identifies that no update is needed.
- ActionPlan comparison is the problem: The ActionPlan diff detects a change in the
.sfield ofresource.Quantity, triggering reconciliation even though the StatefulSets are semantically equal. - Reconciliation loop: Despite StatefulSets being equal, the ActionPlan keeps detecting changes, causing continuous reconciliation attempts.
Affected Code Paths
- Resource Normalization:
pkg/model/common/normalizer/templates/pod.go:normalizeResourceList()- Normalizes resource quantities during template processing - ActionPlan Comparison:
pkg/apis/clickhouse.altinity.com/v1/action_plan.go:MakeActionPlan()- Compares ancestor CR with new normalized CR usingmessagediff.DeepDiff() - Path Exclusion Logic:
pkg/apis/clickhouse.altinity.com/v1/action_plan.go:isExcludedPathSegment()- Determines which diff paths to ignore (currently doesn't exclude resource quantity.sfield) - Reconciliation Trigger:
pkg/controller/chk/worker-reconciler-chk.go:buildCR()- Builds ActionPlan and triggers reconciliation if changes detected
Steps to Reproduce
-
Create a
ClickHouseKeeperInstallationwith memory limits specified in non-normalized format:apiVersion: "clickhouse-keeper.altinity.com/v1" kind: "ClickHouseKeeperInstallation" metadata: name: test-keeper spec: templates: podTemplates: - name: default spec: containers: - name: clickhouse-keeper resources: limits: memory: "2048Mi" # Non-normalized format
-
Apply the resource and observe the operator logs
-
The operator will continuously attempt to reconcile the StatefulSet
-
Check StatefulSet generation - it remains unchanged despite update attempts
Expected Behavior
The operator should recognize that 2048Mi and 2Gi are equivalent values and not attempt to update the StatefulSet when only the format differs.
Actual Behavior
The operator continuously attempts to update the StatefulSet, creating a reconciliation loop.
Workaround
Use normalized resource quantity formats (e.g., 2Gi instead of 2048Mi):
resources:
limits:
memory: "2Gi" # Normalized formatEnvironment
- Operator Version: 0.25.6
- Kubernetes Version: 1.32.6
- ClickHouse Keeper Version: 24.3.5.48
Additional Context
- The normalization function
normalizeResourceList()inpkg/model/common/normalizer/templates/pod.gouses JSON marshal/unmarshal to normalize quantities, which may produce different.sfield values than the original CR - The
resource.Quantitytype has internal fields (.s,.i,.d) that can differ even when quantities are semantically equal - The ActionPlan uses
messagediff.DeepDiff()which performs deep comparison including all internal fields - The issue affects both
ClickHouseKeeperInstallationand potentiallyClickHouseInstallationresources - Note: StatefulSet comparison works correctly (logs show "cur and new objects are equal"), but the ActionPlan comparison triggers reconciliation before StatefulSet reconciliation is even attempted
Related Files
pkg/model/common/normalizer/templates/pod.go- Resource normalizationpkg/controller/common/statefulset/statefulset-reconciler.go- StatefulSet reconciliationpkg/model/common/tags/labeler/auxiliary.go- Version fingerprint generationpkg/util/fingerprint.go- Fingerprint generationpkg/util/hash.go- Serialization for fingerprinting