Skip to content

MLE-21148 Add Retry for Group Config #321

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: release/2.1.0
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 24 additions & 5 deletions charts/templates/configmap-scripts.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,6 @@ metadata:
data:
copy-certs.sh: |
#!/bin/bash
MARKLOGIC_ADMIN_USERNAME="$(< /run/secrets/ml-secrets/username)"
MARKLOGIC_ADMIN_PASSWORD="$(< /run/secrets/ml-secrets/username)"
log () {
local TIMESTAMP=$(date +"%Y-%m-%d %T.%3N")
echo "${TIMESTAMP} $@"
Expand Down Expand Up @@ -177,6 +175,25 @@ data:
echo $message >> /tmp/script.log
}

# Function to retry a command based on the return code
# $1: The number of retries
# $2: The command to run
retry() {
Copy link
Preview

Copilot AI May 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Consider adding a comment above the retry function to explain its parameters and behavior, which would help maintainers quickly understand its purpose.

Copilot uses AI. Check for mistakes.

local retries=$1
shift
local count=0
until "$@"; do
exit_code=$?
count=$((count + 1))
if [ $count -ge $retries ]; then
echo "Command failed after $retries attempts."
return $exit_code
fi
echo "Attempt $count failed. Retrying..."
sleep 5
Copy link
Preview

Copilot AI May 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Consider making the retry delay configurable instead of hard-coding a 5-second sleep to allow easier adjustments in different environments.

Suggested change
sleep 5
sleep ${RETRY_DELAY:-5}

Copilot uses AI. Check for mistakes.

done
}

###############################################################
# Function to get the current host protocol
# $1: The host name
Expand Down Expand Up @@ -560,9 +577,11 @@ data:
info "group \"${current_group}\" updated and a restart of all hosts in the group was triggered"
else
info "unexpected response when updating group \"${current_group}\": ${response_code}"
return 1
fi
else
info "failed to get current group, response code: ${response_code}"
return 1
fi

if [[ "$MARKLOGIC_CLUSTER_TYPE" == "non-bootstrap" ]]; then
Expand All @@ -585,7 +604,7 @@ data:
else
info "not bootstrap host. Skip group configuration"
fi

return 0
}

function configure_tls {
Expand Down Expand Up @@ -814,10 +833,10 @@ data:
if [[ "${MARKLOGIC_CLUSTER_TYPE}" == "bootstrap" ]]; then
log "Info: bootstrap host is ready"
init_security_db
configure_group
retry 5 configure_group
else
log "Info: bootstrap host is ready"
configure_group
retry 5 configure_group
join_cluster $HOST_FQDN
fi
configure_path_based_routing
Expand Down