Skip to content
This repository has been archived by the owner on May 16, 2023. It is now read-only.

[elasticsearch] add master pre stop hook using voting_config_exclusion #108

Closed

Conversation

kimxogus
Copy link
Contributor

Hopefully this will fix #63 for es7 master nodes

@elasticmachine
Copy link
Collaborator

Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually?

@kimxogus kimxogus changed the title Elasticsearch/master pre stop hook [elasticsearch] add master pre stop hook using voting_config_exclusion Apr 26, 2019
Copy link
Contributor

@Crazybus Crazybus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments on how this can be improved. Even with these improvements I still feel that this is a very risky solution in comparison to my suggestion in #63 (comment). Due to the risks added by this I don't feel comfortable in continuing with this approach unless you can convince me other wise. If you see issues with my proposal let's discuss them in the issue first to see if there is a solution that would be safe to use until this is fixed in Elasticsearch.

By automating the removal of masters it means that the cluster can end up in a state where it loses 1 or more masters in an automated fashion. The cluster health _cluster/health check that is used for the readinessProbe will not detect that the amount of eligible masters is below quorum. Meaning that Kubernetes will see this cluster as healthy and gladly continue with the rolling upgrade or other maintenance tasks that requires restarting the pods.

The same thing will happen if the while true loop gets stuck for whatever reason (like if the other masters have been lost and it can't re-elect a new master). Once the grace period is hit this will then cause the pod to be restarted and for quorum to be lost.

The solution I suggested in #63 (comment) gets around this by never actually removing any masters from being excluded from voting. Instead having a sidecar container will make sure that the pods IP address is kept around until a new master has been elected without the need to do anything risky. The implementation would just be using the "if current master is not me" in a while loop. The only issue that needs to be solved is making sure that the pods IP address is kept around long enough for a new master election to happen. Doing anything else just adds more things that could go wrong.

curl -X${method} -s -k --fail ${BASIC_AUTH} {{ .Values.protocol }}://127.0.0.1:{{ .Values.httpPort }}${path}
}

NODE_NAME=$(awk 'BEGIN {print ENVIRON["node.name"]}')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was originally confused about why awk was needed here. But it looks like bash can't access environment variables with dots in the name.


http "/_cluster/voting_config_exclusions/${NODE_NAME}" "POST"

while true ; do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Being an infinite loop seems to be ok here because the termination of pods waits for the graceful timeout for the preStop condition to finish: https://kubernetes.io/docs/concepts/workloads/pods/pod/#termination-of-pods.

NODE_NAME=$(awk 'BEGIN {print ENVIRON["node.name"]}')
echo "Exclude the node ${NODE_NAME} from voting config"

http "/_cluster/voting_config_exclusions/${NODE_NAME}" "POST"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be inside the while true loop. Otherwise the set -e is going to cause the preStop script to exit here if this curl request fails.

# Node won't be actually removed from cluster,
# so node should be deleted from voting config exclusions
echo "Node ${NODE_NAME} is now being deleted from voting config exclusions"
http "/_cluster/voting_config_exclusions?wait_for_removal=false" "DELETE"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should also be inside the while true loop. If this fails then then this node won't be able to rejoin the cluster after starting up again.


while true ; do
echo -e "Wait for new master node to be elected"
if [[ "$(http "/_cat/master")" != *"${NODE_NAME}"* ]]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This if statement can be improved to be safer. Right now this is if statement will be matched if the curl command fails in anyway.

[[ "" != "elasticsearch-master-0" ]] ; echo $?
0

A safer option would be to only break if elasticsearch-master is returned and not ${node.name}. Otherwise this will produce a lot of false positives.

@kimxogus
Copy link
Contributor Author

kimxogus commented May 2, 2019

@Crazybus Thanks for detailed review! Your comments helped me a lot understanding and improving this pr.
I rewrote this with #63 (comment) and made a new pr #119

@kimxogus kimxogus closed this May 2, 2019
@Crazybus
Copy link
Contributor

Crazybus commented May 2, 2019

@Crazybus Thanks for detailed review! Your comments helped me a lot understanding and improving this pr.
I rewrote this with #63 (comment) and made a new pr #119

Thank you for saying this. It means a lot. It's hard work maintaining open source software, even harder when you need to say "no" to something that someone has clearly spent a lot of time and effort on (twice in a row even). I really appreciate the time you have invested so far and I'm glad that we can work together to improve this helm chart :)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Slow re-election when elected master pod is deleted
3 participants