Skip to content

Commit 9b7568f

Browse files
committed
OCPBUGS-60098: podman-etcd: avoid leaving member list on last active agent
When stopping an etcd instance, the agent should not leave the member list if it's the last active agent in the cluster. Leaving the member list in this scenario can cause WAL corruption. This change introduces a check for the number of active resources before attempting to leave the member list. If no other active resources are found, the agent will log a message and skip the leave operation. NOTE: the check on `standalone_node` might not be enough if both agents stop roughly at the same time, hence none of them has enough time to set the attribute.
1 parent 677e3ad commit 9b7568f

File tree

1 file changed

+4
-0
lines changed

1 file changed

+4
-0
lines changed

heartbeat/podman-etcd

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2046,8 +2046,12 @@ podman_stop()
20462046
ocf_log err "error leaving members list: could not get member-id"
20472047
else
20482048
# TODO: is it worth/possible to check the current status instead than relying on cached attributes?
2049+
active_resources_count=$(get_truly_active_resources_count)
2050+
ocf_log info "found '$active_resources_count' active etcd resources (active: '$OCF_RESKEY_CRM_meta_notify_active_resource', stop: '$OCF_RESKEY_CRM_meta_notify_stop_resource')"
20492051
if is_standalone; then
20502052
ocf_log info "last member. Not leaving the member list"
2053+
elif [ "$active_resources_count" -lt 1 ]; then
2054+
ocf_log info "No active agents left. Not leaving the member list"
20512055
else
20522056
ocf_log info "leaving members list as member with ID $member_id"
20532057
endpoint="$(ip_url $(attribute_node_ip get)):2379"

0 commit comments

Comments
 (0)