Skip to content

fix(operator): Make rollout order deterministic for rolling deploy when multiple StatefulSets share same NodeType#18973

Open
aruraghuwanshi wants to merge 2 commits intoapache:masterfrom
aruraghuwanshi:make-rollout-order-deterministic-for-rollingDeploy
Open

fix(operator): Make rollout order deterministic for rolling deploy when multiple StatefulSets share same NodeType#18973
aruraghuwanshi wants to merge 2 commits intoapache:masterfrom
aruraghuwanshi:make-rollout-order-deterministic-for-rollingDeploy

Conversation

@aruraghuwanshi
Copy link
Contributor

Description

With the default/current code, if there is more than one StatefulSet or Deployment belonging to the same NodeType (e.g. historicals-hot and historicals-cold), the rollout order keeps flapping non-deterministically when rollingDeploy is enabled. Within each node type, specs were appended from map iteration over m.Spec.Nodes, so the order can change between calls and across reconciles (e.g. sometimes historicals-hot first, sometimes historicals-cold first).

When getNodeSpecsByOrder is called multiple times during a single rollout, the order returned for specs within the same NodeType can therefore change between calls. That leads to erroneous behavior: multiple StatefulSets/Deployments of that NodeType may undergo rollout at the same time instead of one completing before the next.

This change enforces ordering and consistency by sorting specs by ServiceGroup.key within each node type before appending them to the ordered list. Rollout order is now stable and deterministic across reconciles: one StatefulSet/Deployment within the same NodeType is fully rolled out before the operator moves on to the next.

Deterministic ordering in getNodeSpecsByOrder

  • Added a sort step per node type: after collecting specs by node type, we sort each slice by ServiceGroup.key (ascending) using Go’s sort.Slice before appending to the final ordered list.
  • Order across node types remains defined by druidServicesOrder (historical → overlord → middleManager → indexer → broker → coordinator → router). Within each node type, order is now deterministic by node spec key (e.g. historicals-cold, historicals-hot).

Release note

Druid Operator: When rollingDeploy is enabled, rollout order for multiple StatefulSets/Deployments of the same NodeType (e.g. historicals-hot and historicals-cold) is now deterministic and stable. One such resource is fully rolled out before the next, avoiding concurrent rollouts within the same NodeType.


Key changed/added classes in this PR
  • druid-operator/controllers/druid/ordering.gogetNodeSpecsByOrder
  • druid-operator/controllers/druid/ordering_determinism_test.go — unit tests for deterministic ordering

This PR has:

  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

Within each node type, specs were appended from map iteration over
m.Spec.Nodes, so rollout order was non-deterministic (e.g. historicalst1
vs historicalst2). Sort specs by ServiceGroup.key before appending so
rollout order is stable across reconciles.
@aruraghuwanshi
Copy link
Contributor Author

@AdheipSingh fyi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant