Description
The External Arbiter Operator currently generates a monmap for the external monitor using monmaptool --create, based on environment-driven inputs like ROOK_CEPH_MON_HOST and ROOK_CEPH_MON_INITIAL_MEMBERS.
But the project’s goal and the documented/manual workflow for external mons expect the monmap to be obtained from the running Ceph cluster (e.g. via ceph mon getmap) and then provided to the external arbiter pod. This matches Ceph’s “adding mon” bootstrap flow and avoids “inventing” cluster identity locally.
Analysis
Using monmaptool --create to join an existing cluster exposes the following risks to us:
-
Cluster identity mismatch (critical)
monmaptool --create creates a monmap with a new cluster UUID, which is correct for creating a new cluster but wrong for joining an existing one. The external mon should bootstrap using the monmap that belongs to the live cluster.
-
Stale / incomplete membership sources
ROOK_CEPH_MON_HOST and initial member lists can be stale or incomplete during partial outages, split-brain-like conditions, or after mon changes. A locally generated map can diverge from the live epoch and membership.
-
Feature / compatibility gaps
The running cluster monmap may carry feature bits and expectations (e.g., minimum mon release / feature gates) that are not guaranteed to be reproduced with default monmaptool --create flags.
-
Hard-to-debug join failures
When a mon is rejected during mkfs/join, diagnosing “why” becomes harder if the operator created an artificial map instead of using the source-of-truth monmap from quorum.
Proposed Solution
Update the operator reconciliation flow to fetch the current monmap from the Rook cluster and pass it to the external arbiter pod, instead of generating a new one.
Implementation Details
-
Retrieve monmap from the Rook cluster
- Preferred: exec into the Rook toolbox (
rook-ceph-tools) and run:
ceph mon getmap -o /tmp/monmap
- Fallback: exec into a running Rook mon pod and run the same command.
- Read back the binary
/tmp/monmap content via exec/streaming.
-
Store monmap in the external arbiter cluster
- Create/update a Secret (preferred) or ConfigMap using
binaryData (monmap is binary).
- Keep content-addressed behavior:
- compute hash of the monmap bytes
- only update the resource if the hash changed
-
Mount monmap into the external arbiter pod
- Mount the Secret/ConfigMap as a file into the arbiter pod.
- Replace
init-monmap logic:
- remove
monmaptool --create
- copy the mounted monmap into the mon data dir for
ceph-mon --mkfs / bootstrap.
-
Rollout semantics
- If the monmap resource changes (hash changes), trigger a deterministic rollout:
- annotate arbiter Deployment/Pod template with
monmap-checksum: <sha256>
- Avoid over-refresh: monmap is primarily needed for bootstrap/re-bootstrap; don’t churn pods unless necessary.
-
RBAC / permissions
- Ensure the operator has
pods/exec in the Rook namespace for toolbox/mon pod access.
- Ensure the operator has permissions to create/update Secret/ConfigMap + Deployment in the external arbiter cluster.
Acceptance Criteria
- Operator retrieves the current binary monmap from the running Rook Ceph cluster (toolbox or mon pod).
- Operator stores this monmap in the external arbiter cluster as a mounted file (Secret/ConfigMap
binaryData).
- External arbiter pod boots using the retrieved monmap (no local generation).
monmaptool --create is removed from the init container logic.
- Monmap updates are applied only when content changes, and cause a controlled rollout when required.
Description
The External Arbiter Operator currently generates a monmap for the external monitor using
monmaptool --create, based on environment-driven inputs likeROOK_CEPH_MON_HOSTandROOK_CEPH_MON_INITIAL_MEMBERS.But the project’s goal and the documented/manual workflow for external mons expect the monmap to be obtained from the running Ceph cluster (e.g. via
ceph mon getmap) and then provided to the external arbiter pod. This matches Ceph’s “adding mon” bootstrap flow and avoids “inventing” cluster identity locally.Analysis
Using
monmaptool --createto join an existing cluster exposes the following risks to us:Cluster identity mismatch (critical)
monmaptool --createcreates a monmap with a new cluster UUID, which is correct for creating a new cluster but wrong for joining an existing one. The external mon should bootstrap using the monmap that belongs to the live cluster.Stale / incomplete membership sources
ROOK_CEPH_MON_HOSTand initial member lists can be stale or incomplete during partial outages, split-brain-like conditions, or after mon changes. A locally generated map can diverge from the live epoch and membership.Feature / compatibility gaps
The running cluster monmap may carry feature bits and expectations (e.g., minimum mon release / feature gates) that are not guaranteed to be reproduced with default
monmaptool --createflags.Hard-to-debug join failures
When a mon is rejected during mkfs/join, diagnosing “why” becomes harder if the operator created an artificial map instead of using the source-of-truth monmap from quorum.
Proposed Solution
Update the operator reconciliation flow to fetch the current monmap from the Rook cluster and pass it to the external arbiter pod, instead of generating a new one.
Implementation Details
Retrieve monmap from the Rook cluster
rook-ceph-tools) and run:ceph mon getmap -o /tmp/monmap/tmp/monmapcontent via exec/streaming.Store monmap in the external arbiter cluster
binaryData(monmap is binary).Mount monmap into the external arbiter pod
init-monmaplogic:monmaptool --createceph-mon --mkfs/ bootstrap.Rollout semantics
monmap-checksum: <sha256>RBAC / permissions
pods/execin the Rook namespace for toolbox/mon pod access.Acceptance Criteria
binaryData).monmaptool --createis removed from the init container logic.