Skip to content

Commit

Permalink
Multiple fixes related to kubernetes startup or DNS setup on host
Browse files Browse the repository at this point in the history
  • Loading branch information
eskimo committed May 19, 2022
1 parent d8b7d15 commit 97cc270
Show file tree
Hide file tree
Showing 9 changed files with 342 additions and 133 deletions.
80 changes: 40 additions & 40 deletions services_setup/README.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -516,7 +516,7 @@ service is available. +
In the case of `SAME_NODE_OR_RANDOM`, eskimo tries to find the dependency service on the very same node than
the one running the declaring service if that dependent service is available on that very same node. +
If no instance of the dependency service is not running on the very same node, then any other random node running the
dependency service is used as dependency.
dependency service is used as dependency. (This is only possible for native nodes SystemD services)
* `RANDOM` : This is used to define a simple dependency on another service. In details, `RANDOM` indicates that the
first service wants to know about at least one node where the dependency service is available. That other node can be
any other node of the cluster where the dependency service is installed.
Expand All @@ -525,76 +525,69 @@ indicates that the first service wants to know about at least one node where tha
That other node should be any node of the cluster where the second service is installed yet with a *node number*
(internal eskimo node declaration order) greater than the current node where the first service is installed. +
This is useful to define a chain of dependencies where every node instance depends on another node instance in a
circular way (pretty nifty for instance for elasticsearch discovery configuration).
circular way - pretty nifty for instance for elasticsearch discovery configuration. (This is only possible for native
nodes SystemD services)
* `SAME_NODE` : This means that the dependency service is expected to be available on the same node than the first
service, otherwise eskimo will report an error during service installation.
service, otherwise eskimo will report an error during service installation. (This is only possible for native nodes
SystemD services)
* `ALL_NODES` : this meands that every service defining this dependency will receive the full list of nodes running
the master service in an topology variable.

*The best way to understand this is to look at the examples in eskimo pre-packaged services declared in the bundled
`services.json`.*

For instance:

* Cerebro tries to use the co-located instance of elasticsearch if it is available or any random one otherwise for
instance by using the following dependency declaration:
* Etcd wants to use the co-located instance of gluster. Since gluster is expected to be available from all nodes of the
eskimo cluster, this dependency is simply expressed as:

.cerebro dependency on elasticsearch
.etcd dependency on gluster
----
"dependencies": [
{
"masterElectionStrategy": "SAME_NODE_OR_RANDOM",
"masterService": "elasticsearch",
"masterElectionStrategy": "SAME_NODE",
"masterService": "gluster",
"numberOfMasters": 1,
"mandatory": true
"mandatory": false,
"restart": true
}
]
----

* elasticsearch instances on the different nodes search for each other in a round robin fashion by declaring the
following dependencies (mandatory false ise used to support single node deployments):
* kube-slave services needs to reach the first node where kube-master is available (only one in Eskimo Community
edition in anyway), so the dependency is defined as follows:

.elasticsearch dependency on next elasticsearch instance
.kube-slave dependency on first kube-master
----
"dependencies": [
{
"masterElectionStrategy": "RANDOM_NODE_AFTER",
"masterService": "elasticsearch",
"numberOfMasters": 1,
"mandatory": false
}
],
----

* logstash needs both elasticsearch and gluster. In contrary to elasticsearch, gluster is required on every node in a
multi-node cluster setup. Hence the following dependencies declaration for gluster:

.gluster dependencies definition
----
"dependencies": [
{
"masterElectionStrategy": "SAME_NODE_OR_RANDOM",
"masterService": "elasticsearch",
"masterElectionStrategy": "FIRST_NODE",
"masterService": "kube-master",
"numberOfMasters": 1,
"mandatory": true
"mandatory": true,
"restart": true
},
{
"masterElectionStrategy": "SAME_NODE",
"masterService": "gluster",
"numberOfMasters": 1,
"mandatory": false
}
----

* kafka uses zookeeper on the first node (in the order of declaration of nodes in the eskimo cluster) on which zookeeper
is available:
* kafka-manager needs to reach any random instance of kafka running on the cluster, so the dependency is expressed as
simply as:

.kafka dependency on zookeeper
.kafka-manager dependency on kafka:
----
"dependencies": [
{
"masterElectionStrategy": "FIRST_NODE",
"masterService": "zookeeper",
"numberOfMasters": 1,
"mandatory": true
"mandatory": true,
"restart": true
},
{
"masterElectionStrategy": "RANDOM",
"masterService": "kafka",
"numberOfMasters": 1,
"mandatory": true,
"restart": false
}
----

Expand Down Expand Up @@ -827,6 +820,13 @@ for kibana `{CONTEXT_PATH}/kibana`, e.g. `eskimo/kibana` or `kibana` if no conte










// marker for exclusion : line 830


Expand Down
24 changes: 23 additions & 1 deletion services_setup/base-eskimo/install-eskimo-base-system.sh
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,7 @@ function enable_docker() {
# Deprecated, remove kubernetes.registry
cat > /tmp/daemon.json <<- "EOF"
{
"insecure-registries" : ["kubernetes.registry:5000", "kubernetes.registry:5000"]
"insecure-registries" : ["kubernetes.registry:5000"]
}
EOF
Expand Down Expand Up @@ -436,6 +436,28 @@ fi
echo " - Enabling docker"
enable_docker


# kubelet works for now with cgroupfs, need to ensure docker is working with cgroupfs as well
if [[ `grep cgroup /proc/mounts | grep cgroup2` != "" && `grep cgroup /proc/mounts | wc -l` -lt 4 ]]; then

# Docker is likely running on systemd cgroup driver or cgroup2, need to bring it back to cgroupfs (v1)
if [[ `grep native.cgroupdriver=cgroupfs /etc/docker/daemon.json` == "" ]]; then

sudo sed -i -n '1h;1!H;${;g;s/'\
'{\n'\
' "insecure-registries"'\
'/'\
'{\n'\
' "exec-opts": \["native.cgroupdriver=cgroupfs"\],\n'\
' "insecure-registries"'\
'/g;p;}' /etc/docker/daemon.json


sudo systemctl restart docker containerd
fi
fi


echo " - Disabling IPv6"

sudo sysctl -w net.ipv6.conf.all.disable_ipv6=1 >> /tmp/setup_log 2>&1
Expand Down
51 changes: 0 additions & 51 deletions services_setup/base-eskimo/install-kubernetes.sh
Original file line number Diff line number Diff line change
Expand Up @@ -46,57 +46,6 @@ function fail_if_error(){
fi
}

# extract IP address
function get_ip_address(){
ip_from_ifconfig=`/sbin/ifconfig | grep $SELF_IP_ADDRESS`

if [[ `echo $ip_from_ifconfig | grep Mask:` != "" ]]; then
ip=`echo $ip_from_ifconfig | sed 's/.*inet addr:\([0-9\.]*\).*/\1/'`
elif [[ `echo $ip_from_ifconfig | grep netmask` != "" ]]; then
ip=`echo $ip_from_ifconfig | sed 's/.*inet \([0-9\.]*\).*/\1/'`
fi

export IP_ADDRESS=$ip
}

# compute CIDR suffix from network mask
function mask2cdr () {
# Assumes there's no "255." after a non-255 byte in the mask
local x=${1##*255.}
set -- 0^^^128^192^224^240^248^252^254^ $(( (${#1} - ${#x})*2 )) ${x%%.*}
x=${1%%$3*}
echo $(( $2 + (${#x}/4) ))
}

function get_ip_root() {
get_ip_address
ip_root=`echo $IP_ADDRESS | sed 's/^\([0-9\.]*\)*[0-9]\{2\}/\1/'`
echo $ip_root
}

function get_ip_CIDR() {
get_ip_address
ip_from_ifconfig=`/sbin/ifconfig | grep $IP_ADDRESS`

if [[ `echo $ip_from_ifconfig | grep Mask:` != "" ]]; then
netmask=`echo $ip_from_ifconfig | sed 's/.*Mask:\(.*\)/\1/'`
elif [[ `echo $ip_from_ifconfig | grep netmask` != "" ]]; then
netmask=`echo $ip_from_ifconfig | sed 's/.* netmask \([0-9\.]*\).*/\1/'`
fi
cdr=`mask2cdr $netmask`

ip_root=`get_ip_root`

echo "$ip_root"0/$cdr
}

function get_host_min() {
ip_root=`get_ip_root`
echo "$ip_root"1
}



echo "-- INSTALLING KUBERNETES ------------------------------------------------------"

if [ -z "$K8S_VERSION" ]; then
Expand Down
15 changes: 1 addition & 14 deletions services_setup/common/common.sh
Original file line number Diff line number Diff line change
Expand Up @@ -607,17 +607,4 @@ function preinstall_unmount_gluster_share () {
fi
done
fi
}

# extract IP address
function get_ip_address(){
ip_from_ifconfig=`/sbin/ifconfig | grep $SELF_IP_ADDRESS`

if [[ `echo $ip_from_ifconfig | grep Mask:` != "" ]]; then
ip=`echo $ip_from_ifconfig | sed 's/.*inet addr:\([0-9\.]*\).*/\1/'`
elif [[ `echo $ip_from_ifconfig | grep netmask` != "" ]]; then
ip=`echo $ip_from_ifconfig | sed 's/.*inet \([0-9\.]*\).*/\1/'`
fi

export IP_ADDRESS=$ip
}
}
52 changes: 30 additions & 22 deletions services_setup/common/glusterMountChecker.sh
Original file line number Diff line number Diff line change
Expand Up @@ -40,40 +40,48 @@ for SHARE in `cat /etc/fstab | grep glusterfs | cut -d ' ' -f 2`; do

rm -Rf /tmp/gluster_mount_checker_error

ls -la $SHARE >/dev/null 2>/tmp/gluster_mount_checker_error
if [[ $? != 0 ]]; then
# check if working only if it is supposed to be mounted
if [[ `grep $SHARE /etc/mtab | grep glusterfs` != "" ]]; then

if [[ `grep "Transport endpoint is not connected" /tmp/gluster_mount_checker_error` != "" \
|| `grep "Too many levels of symbolic links" /tmp/gluster_mount_checker_error` != "" \
|| `grep "No such device" /tmp/gluster_mount_checker_error` != "" ]]; then
# give it a try
ls -la $SHARE >/dev/null 2>/tmp/gluster_mount_checker_error

echo `date +"%Y-%m-%d %H:%M:%S"`" - There is an issue with $SHARE. Unmounting" \
>> /var/log/gluster/gluster-mount-checker.log
# unmount if it's not working
if [[ $? != 0 ]]; then

# 3 attempts
for i in 1 2 3; do
if [[ `grep "Transport endpoint is not connected" /tmp/gluster_mount_checker_error` != "" \
|| `grep "Too many levels of symbolic links" /tmp/gluster_mount_checker_error` != "" \
|| `grep "No such device" /tmp/gluster_mount_checker_error` != "" ]]; then

echo `date +"%Y-%m-%d %H:%M:%S"`" + Attempt $i" >> /var/log/gluster/gluster-mount-checker.log
/bin/umount $SHARE >> /var/log/gluster/gluster-mount-checker.log 2>&1
echo `date +"%Y-%m-%d %H:%M:%S"`" - There is an issue with $SHARE. Unmounting" \
>> /var/log/gluster/gluster-mount-checker.log

if [[ $? != 0 ]]; then
# 3 attempts
for i in 1 2 3; do

echo `date +"%Y-%m-%d %H:%M:%S"`" + Unmount FAILED \!" >> /var/log/gluster/gluster-mount-checker.log
echo `date +"%Y-%m-%d %H:%M:%S"`" + Attempt $i" >> /var/log/gluster/gluster-mount-checker.log
/bin/umount $SHARE >> /var/log/gluster/gluster-mount-checker.log 2>&1

else
if [[ $? != 0 ]]; then

# give a little time
sleep 2
echo `date +"%Y-%m-%d %H:%M:%S"`" + Unmount FAILED \!" >> /var/log/gluster/gluster-mount-checker.log

else

# give a little time
sleep 2

break
fi
break
fi

# give a little time
sleep 2
done
fi
# give a little time
sleep 2
done
fi
fi
fi

# try to mount / remount
if [[ `grep $SHARE /etc/mtab | grep glusterfs` == "" ]]; then

echo `date +"%Y-%m-%d %H:%M:%S"`" - $SHARE is not mounted, remounting" \
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -179,9 +179,9 @@ if [[ $MODE == "MASTER" || ( $MODE == "SLAVE" && "$MASTER_KUBE_MASTER_1" != "$SE
fi

if [[ -f $systemd_units_dir/NetworkManager.service ]]; then
/bin/systemctl restart NetworkManager
sudo /bin/systemctl restart NetworkManager
else
/bin/systemctl restart dnsmasq
sudo /bin/systemctl restart dnsmasq
fi
if [[ $? != 0 ]]; then
echo "Failing to restart NetworkManager / dnsmasq"
Expand All @@ -193,6 +193,38 @@ if [[ $MODE == "MASTER" || ( $MODE == "SLAVE" && "$MASTER_KUBE_MASTER_1" != "$SE

echo " + Trying YET AGAIN to ping kubernetes.default.svc.$CLUSTER_DNS_DOMAIN"
/bin/ping -c 1 -W 5 -w 10 kubernetes.default.svc.$CLUSTER_DNS_DOMAIN > /var/log/kubernetes/start_k8s_master.log 2>&1
if [[ $? != 0 ]]; then

which resolvectl >/dev/null 2>&1
if [[ $? == 0 ]]; then

echo " + Now trying the resolvectl trick"
interface=`/sbin/ifconfig | grep -B 1 $SELF_IP_ADDRESS | grep flags | sed s/'^\([a-zA-Z0-9]\+\).*'/'\1'/`
if [[ "$interface" != "" ]]; then

echo " + Calling resolvectl dns $interface 127.0.0.1"
sudo resolvectl dns $interface 127.0.0.1

sleep 2

echo " + Trying AGAIN to ping kubernetes.default.svc.$CLUSTER_DNS_DOMAIN to see if the resolbetrick on external interface worked"
/bin/ping -c 1 -W 5 -w 10 kubernetes.default.svc.$CLUSTER_DNS_DOMAIN > /var/log/kubernetes/start_k8s_master.log 2>&1
if [[ $? != 0 ]]; then

echo " + Out of desperation trying resolvectl trick with eth0"
echo " + Calling resolvectl dns eth0 127.0.0.1"
sudo resolvectl dns eth0 127.0.0.1

sleep 2

fi
fi
fi
fi


echo " + Trying ONE LAST TIME to ping kubernetes.default.svc.$CLUSTER_DNS_DOMAIN"
/bin/ping -c 1 -W 5 -w 10 kubernetes.default.svc.$CLUSTER_DNS_DOMAIN > /var/log/kubernetes/start_k8s_master.log 2>&1
if [[ $? != 0 ]]; then

if [[ $MODE == "MASTER" ]]; then
Expand Down Expand Up @@ -225,7 +257,7 @@ if [[ $MODE == "MASTER" || ( $MODE == "SLAVE" && "$MASTER_KUBE_MASTER_1" != "$SE

let ping_cnt=ping_cnt+1

echo " - checking redeploy coredns looping"
echo " - checking looping"
if [[ $ping_cnt -gt 5 ]]; then
echo " + Redeployed coredns 5 times in a row. Crashing !"
echo "0" > /etc/k8s/dns-ping-cnt
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -68,11 +68,13 @@ void parse(String content) {
String[] contentLines = content.split("\n");
for (int i = 0; i < contentLines.length; i++) {

if (contentLines[i].startsWith("●") && contentLines[i].contains(".service")) {
if ( (contentLines[i].startsWith("●")
|| contentLines[i].startsWith("×") )
&& contentLines[i].contains(".service")) {

handleServiceFound (
contentLines[i].substring(
contentLines[i].indexOf('●') + 2,
(contentLines[i].indexOf('●') > -1 ? contentLines[i].indexOf('●') : contentLines[i].indexOf('×')) + 2,
contentLines[i].indexOf(".service")),
i,
contentLines);
Expand Down
Loading

0 comments on commit 97cc270

Please sign in to comment.