Skip to content
This repository has been archived by the owner on May 16, 2023. It is now read-only.

[elasticsearch] make service configurable #123

Merged
merged 12 commits into from
Jun 12, 2019
2 changes: 1 addition & 1 deletion elasticsearch/templates/statefulset.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -241,7 +241,7 @@ spec:

cleanup () {
while true ; do
local master="$(http "/_cat/master?h=node")"
local master="$(http "/_cat/master?h=node" || echo "")"
kimxogus marked this conversation as resolved.
Show resolved Hide resolved
if [[ $master && $master != "${NODE_NAME}" ]]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why change this? It just seems less safe than what was already in there. Yes it will fail if there is an empty string, however if the API returns anything else (or some kind of weird error) then it is going to be a non-empty string and this will pass.

Did you find an issue with the current logic that this fixes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I migrated es cluster from a cluster(all nodes have master, ingest and data role) to this chart. Previous master was not prefixed by uname, so I had to wait several minutes during rollout caused by configuration change when nodes from both old and new cluster exists before exclude old nodes from cluster shard allocation.
I think this is unnecessary because --fail option is passed to curl(https://github.com/elastic/helm-charts/blob/master/elasticsearch/templates/statefulset.yaml#L239), script will exit if an error is returned from API($master won't have a value).
As h=node query param is passed, $master string will only contain node name of master node if it's not empty.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was my explanation not enough? Then I can restore this because this only happens in specific migration case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is unnecessary because --fail option is passed to curl(https://github.com/elastic/helm-charts/blob/master/elasticsearch/templates/statefulset.yaml#L239), script will exit if an error is returned from API($master won't have a value).

Thanks for noticing this. This combination of set -e and curl --fail will mean that this script might exit early if the call fails. Ideally we want this loop to keep on running until we can see that a new master exists that isn't the current pod. Exiting early is not what we want to happen here.

My concern is whether or not the API will return an error when there is no master or not. If the API returns a weird message like "no master yet" with a 200 then it's possible for the script to exit too early. Looking at the code or Elasticsearch it looks like it will actually return a dash (-) if there is no master found.

Then I can restore this because this only happens in specific migration case.

Did it just mean that Kubernetes waited for the 120 second timeout while stopping each of the masters?

I think what actually makes a lot more sense is to check that it starts with {{ template "masterService" . }}. That way this will always point to the prefix for the right master even during migrations like in https://github.com/elastic/helm-charts/blob/master/elasticsearch/examples/migration/README.md

Was my explanation not enough?

Sorry for the slow reply. I'm currently on a work trip and only have a few quiet moments to sneak some work in. I'll be at Kubecon in Barcelona next week too so things will be slow for a little while. If you are at Kubecon come and say hi at the Elastic booth :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did it just mean that Kubernetes waited for the 120 second timeout while stopping each of the masters?

Yes, I had to wait 120 seconds per each master node.

I think what actually makes a lot more sense is to check that it starts with {{ template "masterService" . }}. That way this will always point to the prefix for the right master even during migrations like in /elasticsearch/examples/migration/README.md@master

Changed to {{ template "masterService" . }}!

I'm currently on a work trip and only have a few quiet moments to sneak some work in. I'll be at Kubecon in Barcelona next week too so things will be slow for a little while. If you are at Kubecon come and say hi at the Elastic booth :)

I'm not going to this Kubecon, but I hope to visit and say hi someday :)

echo "This node is not master."
break
Expand Down