Skip to content

Feat/check target connectivity #208

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 31 additions & 0 deletions scripts/CEE/check-target-connectivity/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# Check Target Connectivity from Openshift Cluster Script

## Purpose

This script is designed to perform multiple checks to troubleshoot target connectivity from OpenShift cluster

Performed checks:
- DNS resolution check via nslookup: $ nslookup "$TARGET"
- DNS resolution via Dig: $ dig +short "$TARGET"
- ICMP check via ping: $ timeout 10 ping -c 3 "$TARGET"
- Routing Check via traceroute: $ timeout 5 traceroute -m 10 -w 1 -q 1 "$TARGET"
- Check Target Port is Open via nmap: $ timeout 5 nmap -p "$PORT" "$TARGET" 2>&1 | grep -q "$PORT/tcp open"

Notes:
- Each check awaits for 5 seconds before starting to minimize impact on the network.

## Usage

Parameters:
- TARGET: Target host
- PORT: Target port

```bash
ocm backplane managedjob create CEE/check-target-connectivity -p TARGET={target} -p PORT={port}
```

## Important Notes

- The script utilizes the `oc` command-line tool, and the user running the script should have the necessary permissions to access the cluster.
- This script is read-only and does not modify any resources in the cluster.
- Ensure that the required tools (`oc`) are available in the environment where the script is executed.
48 changes: 48 additions & 0 deletions scripts/CEE/check-target-connectivity/metadata.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
file: script.sh
name: check-target-connectivity
shortDescription: Performs multiple checks to validate target connectivity.
description: |
Performs multiple checks to validate target connectivity.

Performed checks:
- DNS resolution check via nslookup: $ nslookup "$TARGET"
- DNS resolution via Dig: $ dig +short "$TARGET"
- ICMP check via ping: $ timeout 10 ping -c 3 "$TARGET"
- Routing Check via traceroute: $ timeout 5 traceroute -m 10 -w 1 -q 1 "$TARGET"
- Check Target Port is Open via nmap: $ timeout 5 nmap -p "$PORT" "$TARGET" 2>&1 | grep -q "$PORT/tcp open"

Notes:
- Each check awaits for 5 seconds before starting to minimize impact on the network.

author: Alex Volkov
allowedGroups:
- SREP
- CEE
rbac:
clusterRoleRules:
- apiGroups:
- ""
resources:
- "pods"
- "pods/exec"
- "pods/log"
verbs:
- "create"
- "get"
- "delete"
- apiGroups:
- "security.openshift.io"
verbs:
- "*"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need all permissions to SCC?

Copy link
Author

@alvlkov alvlkov Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm open to reduce the verbs, I'm not sure what were the minimals for:

securityContext:
      allowPrivilegeEscalation: false
      runAsNonRoot: true
      runAsUser: 1001
      capabilities:
        drop:
        - ALL
      seccompProfile:
        type: RuntimeDefault

resources:
- "securitycontextconstraints"
envs:
- key: TARGET
description: Target hostname
optional: false
- key: PORT
description: Target port
optional: false

language: bash
customerDataAccess: false
147 changes: 147 additions & 0 deletions scripts/CEE/check-target-connectivity/script.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
#!/bin/bash

set -e
set -o nounset
set -o pipefail

# Configurable variables
PODNAME="check-target-connectivity"
NS="openshift-backplane-managed-scripts"

# Define the target (external service)
if [[ -z "${TARGET:-}" ]]; then
echo 'Variable TARGET cannot be blank'
exit 1
fi

if [[ -z "${PORT:-}" ]]; then
echo 'Variable PORT cannot be blank'
exit 1
fi

# Input sanity checks
if ! [[ "$PORT" =~ ^[0-9]+$ ]]; then
echo "Error: Port must be a valid number."
exit 1
fi

start_job(){
CURRENTDATE=$(date +"%Y-%m-%d %T")
echo "Job started at $CURRENTDATE"
echo ".................................."
echo
}

finish_job(){
CURRENTDATE=$(date +"%Y-%m-%d %T")
echo
echo ".................................."
echo "Job finished at $CURRENTDATE"
}

#Create check pod
# shellcheck disable=SC1039
check_target_connectivity(){
echo 'Starting check pod...'
oc create -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
name: ${PODNAME}
namespace: ${NS}
spec:
privileged: false
restartPolicy: Never
containers:
- name: check-target-connectivity
image: quay.io/app-sre/srep-network-toolbox:latest
image-pull-policy: Always
command:
- '/bin/bash'
- '-c'
- |-
#!/bin/bash
set -e
# Check if the target is resolvable
echo "Checking if the target ($TARGET) is resolvable..."
if command -v nslookup > /dev/null; then
nslookup "$TARGET"
echo ".................................."
else
echo "'nslookup' command not available, skipping resolution check."
fi

# Check if the target is reachable via ICMP using ping
echo "Pinging the target ($TARGET)..."
sleep 5
timeout 10 ping -c 3 "$TARGET"
echo ".................................."

# Check the routing to the target via traceroute with limits
echo "Checking routing to the target ($TARGET) via traceroute..."
if command -v traceroute > /dev/null; then
timeout 5 traceroute -m 10 -w 1 -q 1 "$TARGET"
else
echo "'traceroute' command not available, skipping routing check."
fi
echo ".................................."

# Check if target port is OPEN via nmap
echo "Checking if port $PORT on target ($TARGET) is open using nmap..."
sleep 5
# Run nmap to check if the port is open
if timeout 5 nmap -p "$PORT" "$TARGET" 2>&1 | grep -q "$PORT/tcp open"; then
echo "Port $PORT is open on the target."
else
echo "Port $PORT is NOT open on the target."
fi


# Check DNS resolution using dig
echo "Checking DNS resolution for $TARGET using dig..."
sleep 5
if command -v dig > /dev/null; then
dig +short "$TARGET"
else
echo "'dig' command not available, skipping DNS check."
fi

securityContext:
allowPrivilegeEscalation: false
runAsNonRoot: true
runAsUser: 1001
capabilities:
drop:
- ALL
seccompProfile:
type: RuntimeDefault
EOF

while [ "$(oc -n ${NS} get pod "${PODNAME}" -o jsonpath='{.status.phase}' 2>/dev/null)" != "Succeeded" ];
do
if [ "$(oc -n ${NS} get pod "${PODNAME}" -o jsonpath='{.status.phase}' 2>/dev/null)" == "Failed" ];
then
echo "The target connectivity check pod has failed. The logs are:"
# Do not error if check pod is still in initialising state
oc -n $NS logs "${PODNAME}" -c check-target-connectivity || true
oc -n $NS delete pod "${PODNAME}" >/dev/null 2>&1
exit 1
fi
sleep 30
done

oc -n $NS logs "${PODNAME}" -c check-target-connectivity
oc -n $NS delete pod "${PODNAME}" >/dev/null 2>&1

}

# Run all checks with retries and await timeout
main(){
start_job
check_target_connectivity
finish_job
}

main