Scripts Inventory

cleanup.Jenkinsfile: Jenkinsfile with Declarative Pipeline Multiline sh that cleanups old builds. All the Stages are now visually monitored. It is triggered every saturday night and ends with jenkins restart. These Multi-line bash commands make easier to read Jenkins Projects.
daily_restart.Jenkinsfile: A script that automatically triggers a daily restart of Jenkins due to performance issues (Jenkins is a Java application). Jenkins with Declarative Pipeline multiline sh that restarts Jenkins every night except on Saturday nights (when cleanup.Jenkinsfile is triggered).
confluence6-docker-build.Jenkinsfile: Declarative Jenkinsfile for building and uploading a docker image to Openshift-DEV, Dockerhub and Openshift-PROD (Stages are disabled via Conditional Build Steps). Tip: A Docker Plugin for Jenkins can easily replace this Jenkinsfile. This script can be found here.

Configuration requirements

This cleanup.Jenkinsfile script is supposed to be run in jenkins master nodes, where all the legacy build artifacts are achived ($JENKINS_HOME/jobs/YourProjectName/builds).

Jenkins-CLI Authentication

Authentication is preferably with an -auth option, which takes a username:apitoken argument. Get your API token from [jenkins_url]/me/configure

The corresponding q-number's password can be used instead of the mentioned "apitoken". The former one is LDAP based, while apitoken is unique to each Jenkins Master node.

Username and password credentials (or username and apitoken) need to be saved in each jenkins master server in $JENKINS_HOME/.jenkins-cli with the following format: "username:password" or "username:apitoken".

The chosen username requires Overall/Administer permission in order to restart Jenkins. Otherwise you will obtain the following error:

ERROR: q-username is missing the Overall/Administer permission

Setup a new Pipeline Job in Jenkins

Jenkins job -> New Item -> "job name" Pipeline -> Pipeline SCM: git , Repository: "this repo", Script Path: cleanup.Jenkinsfile
Jenkins job -> New Item -> "job name" Pipeline -> Pipeline SCM: git , Repository: "this repo", Script Path: daily_restart.Jenkinsfile
Trigger a first manual build of the created job to load configuration from Jenkinsfile

Technical details

Where Jenkins stores build files

Jenkins Master nodes: jenkins master nodes archive all the legacy build artifacts in $JENKINS_HOME/jobs/YourProjectName/builds.
Jenkins Slave nodes: jenkins slaves keep the latest build in their corresponding Workspace in /home/cloud/jenkins_slave/workspace/YourJobName.

Jenkins reload-configuration after cleaning up old builds

Once old build folders have been deleted, jenkins' index of builds needs to be updated by reloading its configuration (fast) or by restarting jenkins (slow). This "reload-configuration" of Jenkins is finally not run in Jenkinsfile cleanup pipeline as we are proceeding with an automated jenkins restart in its last stage. This restart needs to be done every night and it makes sense to be included in this cleanup job.

Jenkins restart VS safe-restart

[jenkins_url]/safeRestart. – This will restart Jenkins after the current builds have completed.
[jenkins_url]/restart – This will force a restart. Builds will not wait to complete.

daily_restart.Jenkinsfile and cleanup.Jenkinsfile apply a hard restart of Jenkins Master when run in Main Jenkins Master Enterprise. A safe-restart was initially setup for all Jenkins Master nodes but it could easily last for 2-3 hours in Main Jenkins (sometimes requiring the Jenkins Admin to kill the remaining zombie jobs).

The current configuration is the following:

Main Jenkins Master: restart
Remaining Jenkins Master nodes: safeRestart

Unusual characters in foldernames/filenames and IFS env var

IFS env var is modified within this script in order to avoid errors when find command tries to navigate through a Project's folder with spaces in its name. By setting this up we make sure that all the existing folders related to Jenkins Projects are treated correctly and no errors arise when this script is run.

Summary of lines and parameters found in cleanup script

Item	Parameter	Details
IFS env var	IFS=$(echo -en "\n\b")	IFS lines discard any space and meta char in a folder's name
find command	-xdev	Don't descend directories on other filesystems
find command	-mtime n	File's data was last modified n*24 hours ago
find command	-delete	removes a file , not valid with directories
find command	-mindepth 1 -maxdepth 1 -type d	no recursive search for directories inside the current directory
find command	-print0	ASCII NUL character to separate the entries in the file list that it produces (Unusual characters in filenames)
xargs command	-0	assume that arguments are separated with ASCII NUL instead of white space
df command	cut -d" " -f2-	first column output is removed as device name is too long to be displayed
xargs command	-t	Used in cleanup-script.sh to show a list of old build folders to be deleted. Used in Jenkinsfile to get traces of deleted data.

Troubleshooting an OpenShift POD running Jenkins Master node

This chapter is necessary to understand how to deal with a trouble Jenkins Enterprise POD (unresponsive and/or with more restarts than the ones triggered by this daily restart).

Solutions to Known Errors Matrix Table

The below matrix table aims to help with solutions to well known incidents:

Error	Solution or Workaround
Jenkins Enterprise is unresponsive or extremely slow	Force a POD Restart by A) Scaling the POD to 0 and then to 1 from Openshift GUI, or by B) killing the Java Process of Jenkins: kill 7
Restart of Jenkins Enterprise and its POD is taking too long > 1 hour	Deploy Jenkins Microservice: A) "oc rollout latest jenkins", or B) clicking on 'Deploy' button in Openshift GUI

Container State. Pod exitCode errors

Exit codes seen when using oc get pods or kubectl get pods are unix shell exit codes (echo $?).
Google "bash error code 137", "bash error code 143", etc.
Signal 15 is a SIGTERM (see "kill -l" for a complete list). It's the way most programs are gracefully terminated, and is relatively normal behaviour.This indicates system has delivered a SIGTERM to the processes. This is usually at the request of some other process (via kill()) but could also be sent by your process to itself (using raise()). This signal requests an orderly shutdown of process or system itself. The real question is "Who/what is sending the SIGTERM?"

Exit Code Number	Meaning	Example	Comments
128 +n	Fatal error signal "n"	kill -n $PPID of script	$? returns 128 + n
137 (128 + 9)	Fatal error signal "9" (SIGKILL)	kill -9 $PPID of script	$? returns 137 (128 + 9)
143 (128 + 15)	Fatal error signal "15" (SIGTERM)	kill -15 $PPID of script	$? returns 143 (128 + 15)

List of SIGTERM signals

user@jumphost:~> kill -l
 1) SIGHUP       2) SIGINT       3) SIGQUIT      4) SIGILL
 5) SIGTRAP      6) SIGABRT      7) SIGBUS       8) SIGFPE
 9) SIGKILL     10) SIGUSR1     11) SIGSEGV     12) SIGUSR2
13) SIGPIPE     14) SIGALRM     15) SIGTERM     16) SIGSTKFLT
17) SIGCHLD     18) SIGCONT     19) SIGSTOP     20) SIGTSTP
21) SIGTTIN     22) SIGTTOU     23) SIGURG      24) SIGXCPU
25) SIGXFSZ     26) SIGVTALRM   27) SIGPROF     28) SIGWINCH
29) SIGIO       30) SIGPWR      31) SIGSYS      34) SIGRTMIN
35) SIGRTMIN+1  36) SIGRTMIN+2  37) SIGRTMIN+3  38) SIGRTMIN+4
39) SIGRTMIN+5  40) SIGRTMIN+6  41) SIGRTMIN+7  42) SIGRTMIN+8
43) SIGRTMIN+9  44) SIGRTMIN+10 45) SIGRTMIN+11 46) SIGRTMIN+12
47) SIGRTMIN+13 48) SIGRTMIN+14 49) SIGRTMIN+15 50) SIGRTMAX-14
51) SIGRTMAX-13 52) SIGRTMAX-12 53) SIGRTMAX-11 54) SIGRTMAX-10
55) SIGRTMAX-9  56) SIGRTMAX-8  57) SIGRTMAX-7  58) SIGRTMAX-6
59) SIGRTMAX-5  60) SIGRTMAX-4  61) SIGRTMAX-3  62) SIGRTMAX-2
63) SIGRTMAX-1  64) SIGRTMAX

No errors arise after a new POD deployment

No errors are seen when Jenkins Microservice has been deployed again via OpenShift GUI (deploy button) or via "oc rollout latest jenkins_id". Deployment number will be increased by 1 (like deployment #20, or "revision number 20" in 'oc rollout history'). Restart Count is 0:
```
Container jenkins
State:
Running since May 10, 2019 11:24:11 AM
Ready:
true
Restart Count:
0
```

Kubernetes/Openshift exit code 137

Container with Out of Memory Kill. OutOfMemory error in Jenkins ("OOM" error code). Container/POD is using too much memory and is killed automatically (new restart triggered)
Java Process killed manually with a "kill -9 [PID]" (as explained above)

Kubernetes/Openshift exit code 143

When scaling down number of pods, the following error is reported in the OpenShift console, even though the pod is stopped correctly:
```
The container spring-boot did not stop cleanly when terminated (exit code 143).
```
Root Cause: This issue is related to the exit codes returned from the JVM when it receives signals such as SIGINT, SIGTERM, etc. The JVM default is to return an exit code of 128+signal-id. E.g for SIGTERM one would see an exit code of 143 (128+15). This causes Red Hat OpenShift Container Platform to display a warning message on pod scale down stating that that the container did not stop cleanly (when in actual fact, it did).

This error also arises when Jenkins is restarted:

A manual or automated restart of Jenkins also implies a new Container/POD restart.
Container restartCount is increased
A Jenkins Restart is triggered automatically by Jenkinsfile-daily-restart found in this repo (Jenkins cli).
A Jenkins Restart can also be accomplished from the Jenkins GUI: This is not available when Jenkins is overloaded and unresponsive.

Simple workaround to force a Jenkins Restart: In the following example we force a Container/POD restart by killing Jenkins Java process with a "kill PID" (SIGTERM signal, instead of SIGKILL signal with "kill -9"):

  user@jumpserver:~/> oc get pods | grep ^jenkins-2
  jenkins-20-1d4vm                     1/1       Running   0          45m
  user@jumpserver:~/> oc rsh jenkins-20-1d4vm
  sh-4.2$ ps ux
  USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
  1019710+      1  0.0  0.0   4324   560 ?        Ss   11:24   0:00 /bin/tini -vv
  1019710+      7  109  0.5 61154992 5864592 ?    Sl   11:24  49:43 java -Djava.aw
  1019710+    938  0.0  0.0  13520  2076 ?        Ss+  11:27   0:00 /bin/sh
  1019710+   3037  0.0  0.0  13520  2000 ?        Ss+  11:31   0:00 /bin/sh
  1019710+   7723  0.0  0.0  13516  2064 ?        Ss+  11:44   0:00 /bin/sh
  1019710+  16464  2.0  0.0  13516  1936 ?        Ss   12:09   0:00 /bin/sh
  1019710+  16474  0.0  0.0  49060  1828 ?        R+   12:09   0:00 ps ux
  sh-4.2$ kill 7

Openshift oc cli examples

oc get pods
oc get pods -o wide
oc describe pod jenkins-as-a-service-73-07xwx
oc describe svc jaas-test
oc get svc
oc set env pods --all --list

ps -eo pid,lstart,cmd | fold -w 130 (checking Jenkins Java process with arguments and start time)

user@jumpserver:~> oc rsh jenkins-19-0n5jm
sh-4.2$ ps -eo pid,lstart,cmd | fold -w 130
  PID                  STARTED CMD
    1 Sat Apr 28 00:54:42 2019 /usr/bin/dumb-init -- /usr/libexec/s2i/run
    7 Sat Apr 28 00:54:42 2019 java -XX:+UseParallelGC -XX:MinHeapFreeRatio=20 -XX:MaxHeapFreeRatio=40 -XX:GCTimeRatio=4 -XX:Adap
tiveSizePolicyWeight=90 -XX:MaxMetaspaceSize=100m -Duser.home=/var/lib/jenkins -XX:MaxMetaspaceSize=512m -Djavamelody.application-
name=JENKINS -Dfile.encoding=UTF8 -jar /usr/lib/jenkins/jenkins.war --prefix=/myjenkins
45744 Sun Apr 29 15:12:43 2018 /bin/sh
47707 Wed May  9 16:51:16 2018 /bin/sh
54481 Wed May  9 18:20:04 2018 /bin/sh
54656 Wed May  9 18:22:41 2018 ps -eo pid,lstart,cmd
54657 Wed May  9 18:22:41 2018 fold -w 130

oc get pod jenkins-19-0n5jm

POD Status: oc get pod jenkins-19-0n5jm -o yaml

In the following example, LastState:
- startedAt 2019-05-02T05:40:30Z (05:40:30 UTC -> 07:40:30 CEST)
- finishedAt 2019-05-03T04:38:08Z (04:38:08 UTC -> 06:38:08 CEST)
CurrentState startedAt: ...

  containerStatuses:
  - containerID: docker://9ad7f3a23d6328405090a380ed01305a9e21fefca91a49021db9b3848f208d09
    image: myip:5000/prod/jenkins
    imageID: docker-pullable://myip:5000/delivery-prod/jenkins@sha256:8f5d31f51cef106df8e68ed432e190f9f6bf0e9307d3f35490584cc4db808c0d
    lastState:
      terminated:
        containerID: docker://de0a32ddab499573fe7cd79745751b6eb2d1e3d84e4c0f529929985ed5c9947a
        exitCode: 137
        finishedAt: 2019-05-03T04:38:08Z
        reason: Error
        startedAt: 2019-05-02T05:40:30Z
    name: jenkins
    ready: true
    restartCount: 9
    state:
      ...

POD Status: oc get pod jenkins-20-1d4vm -o yaml

In the following example, LastState:
- startedAt: 2019-05-14T03:01:40Z (3:01:40 UTC -> 5:01:40 CEST , daily restart)
- finishedAt: 2019-05-14T12:06:03Z (12:06:03 UTC -> 14:06:03 CEST)
CurrentState:
- startedAt: 2019-05-14T12:06:09Z (12:06:09 UTC -> 14:06:09 CEST)

containerStatuses:
- containerID: docker://ed4e7b3c2e427d0f5e3940bafe21737173a00df3f09e6853f53b85a47e023c30
  image: myip:5000/delivery-prod/jenkins
  imageID: docker-pullable://myip:5000/delivery-prod/jenkins@sha256:8f5d31f51cef106df8e68ed432e190f9f6bf0e9307d3f35490584cc4db808c0d
  lastState:
    terminated:
      containerID: docker://92f7db3fbb442d1609abdad498051355266ef0072e97f226e2f9374932671044
      exitCode: 137
      finishedAt: 2019-05-14T12:06:03Z
      reason: Error
      startedAt: 2019-05-14T03:01:40Z
  name: jenkins
  ready: true
  restartCount: 5
  state:
    running:
      startedAt: 2019-05-14T12:06:09Z
hostIP: myip
phase: Running
podIP: ip-addr
startTime: 2019-05-10T09:24:04Z

oc get dc

user@jumphost:~/> oc get dc
NAME                        REVISION   DESIRED   CURRENT   TRIGGERED BY
cjoc                        11         1         1
jenkins                     19         1         1
jenkins-1                   27         0         0
jenkins-2                   20         1         1
jenkins-3                   1          0         0
jenkins-4                   0          1         0
jenkins-5                   9          1         1
jenkins-6                   14         1         1
jenkins-7                   9          1         1
nexus                       7          1         1
sonar6                      4          1         1
sonar6-postgresql           1          1         1         config,image(postgresql:9.5)

oc deploy --latest jenkins (deprecated, use "oc rollout jenkins" instead)

user@jumphost:~/> oc deploy --latest jenkins
Flag --latest has been deprecated, use 'oc rollout latest' instead
Started deployment #20
Use 'oc logs -f dc/jenkins' to track its progress.
user@jumphost:~/>

oc rollout latest jenkins: This is like clicking on "deploy" button on Openshift GUI when we want to deploy a POD. PROCEED WITH THIS IN CASE A RESTART OF JENKINS IS STUCK FOR MORE THAN 1 HOUR. Deployment number will be increased by 1 (like deployment #20, or "revision number 20" in 'oc rollout history').
oc logs -f dc/jenkins

oc rollout history dc/jenkins

user@jumphost:~/> oc rollout history dc/jenkins
deploymentconfigs "jenkins"
REVISION        STATUS          CAUSE
1               Complete        manual change
2               Complete        manual change
4               Complete        manual change
5               Complete        manual change
6               Complete        manual change
7               Complete        manual change
8               Complete        manual change
9               Failed          The deployment was cancelled by the user
10              Complete        manual change
11              Failed          manual change
12              Failed          The deployment was cancelled by the user
13              Failed          The deployment was cancelled by the user
14              Failed          The deployment was cancelled by the user
15              Failed          manual change
16              Failed          The deployment was cancelled by the user
17              Failed          manual change
18              Failed          manual change
19              Complete        manual change
20              Running         manual change

Force the termination of POD:

$ oc delete pod jenkins-10-6095v --force=true --grace-period=0
warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
pod "jenkins-10-6095v" deleted

oc describe quota

oc describe quota quota-48gi -n delivery-prod ("oc describe quota" in default delivery-prod Project)

user@jumphost:~/> oc describe quota 
Name:           quota-48gi
Namespace:      delivery-prod
Resource        Used            Hard
--------        ----            ----
requests.cpu    22750m          30
requests.memory 36467264Ki      48Gi

oc status
oc status -v (warnings can be seen)
oc get all
oc logs -f po/jenkins-19-0n5jm | less -S
oc scale --replicas=0 dc jenkins (Change the number of pods in a deployment)
oc scale --replicas=3 dc jenkins (Change the number of pods in a deployment)
oc rollout history po/jenkins-19-0n5jm (viewing a deployment)
oc rollout history dc/jenkins --revision=19
oc get builds
oc edit: Edit a resource on the server
oc new-build: Create a new build configuration
oc start-build: Start a new build
oc cancel-build: Cancel running, pending, or new builds
oc import-image: Imports images from a Docker registry
oc tag: Tag existing images into image streams
oc debug jenkins-52-167c9 (Launch a new instance of a pod for debugging)
oc explain: Documentation of resources
oc explain deploy (Deployment enables declarative updates for Pods and ReplicaSets)
oc exec: Execute a command in a container
oc cp: Copy files and directories to and from containers.

Openshift GUI examples

We can see the following output in OpenShift GUI after a jenkins restart (which implies a POD restart, with this POD's "restartCount" increased by 1):

Status:
Running Deployment:
jenkins, #19
IP:
ipaddr.ip
Node:
nodeaddr.net (ipaddr)Restart Policy:
Always
Container jenkins
State:
Running since May 4, 2019 6:37:11 AM
Last State
Terminated at May 4, 2019 6:37:01 AM with exit code 143 (Error)
Ready:
false
Restart Count:
10
Debug in Terminal

Jenkins Slaves Cleanup Scripts

It can be easily automated via a jenkins job that triggers the following shell script:

mv /home/cloud/jenkins_slave/workspace /home/cloud/jenkins_slave/workspace.old && rm -Rf /home/cloud/jenkins_slave/workspace.old &
mv /global/apps/cdbuilt/build_server/maven/<myproject>/local-repo-maven-3 /global/apps/cdbuilt/build_server/maven/<myproject>/local-repo-maven-3.old && rm -Rf /global/apps/cdbuilt/build_server/maven/<myproject>/local-repo-maven-3.old &

Alternative: Workspace cleanup plugin: https://plugins.jenkins.io/ws-cleanup

References

Jenkins.io: Jenkins CLI Authentication
Stackoverflow: How to restart jenkins manually
Cloudbees Support Guideline: Cleanup and disk space management
GNU.org: Deleting files with find
Dzone.com: Declarative Pipeline Refcard
Cloudbees: Declarative Pipeline Quick Reference
Kubernetes Docs
Kubernetes.io: Pod Lifecycle
Openshift: Application memory sizing
Openshift: Openshift Container Platform 3.9 CLI Reference
Openshift: Basic CLI Operations - Object types
Youtube: Introduction to OpenShift Enterprise CLI
redhat.com Video: OpenShift Enterprise 3.2 - Creating your first application from the CLI
Openshift: Exit Code 137
redhat.com: Openshift. Diagnosing an OOM Kill
redhat.com: Scaling down pods with a running Java process results in a warning message (exit code 143)
redhat.com: Bash exit codes with special meanings
redhat.com: What is 'Signal 15' ?
redhat.com: How to find out who sent the 'signal-15'
Stackoverflow: How to debug container images using openshift
Stackoverflow: How to restart pod in OpenShift?
Reddit.com: jenkinsci

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Scripts Inventory

Configuration requirements

Jenkins-CLI Authentication

Setup a new Pipeline Job in Jenkins

Technical details

Where Jenkins stores build files

Jenkins reload-configuration after cleaning up old builds

Jenkins restart VS safe-restart

Unusual characters in foldernames/filenames and IFS env var

Summary of lines and parameters found in cleanup script

Troubleshooting an OpenShift POD running Jenkins Master node

Solutions to Known Errors Matrix Table

Container State. Pod exitCode errors

List of SIGTERM signals

No errors arise after a new POD deployment

Kubernetes/Openshift exit code 137

Kubernetes/Openshift exit code 143

Openshift oc cli examples

Openshift GUI examples

Jenkins Slaves Cleanup Scripts

References

Files

README.md

Latest commit

History

README.md

File metadata and controls

Scripts Inventory

Configuration requirements

Jenkins-CLI Authentication

Setup a new Pipeline Job in Jenkins

Technical details

Where Jenkins stores build files

Jenkins reload-configuration after cleaning up old builds

Jenkins restart VS safe-restart

Unusual characters in foldernames/filenames and IFS env var

Summary of lines and parameters found in cleanup script

Troubleshooting an OpenShift POD running Jenkins Master node

Solutions to Known Errors Matrix Table

Container State. Pod exitCode errors

List of SIGTERM signals

No errors arise after a new POD deployment

Kubernetes/Openshift exit code 137

Kubernetes/Openshift exit code 143

Openshift oc cli examples

Openshift GUI examples

Jenkins Slaves Cleanup Scripts

References