[Kubernetes cronjob] pg_isready only works interactively (pod user permissions maybe?) #361
Open
Description
Summary
TL;DR: pg_isready
only seems to work when executed interactively in the pod, as opposed to when the pod is executed. This happens after I had to manually add the PGSSLMODE=require
env variable because it was throwing a /root/.postgresql/postgresql.crt: Permission denied
error.
Steps to reproduce
What I did was I:
- Created the k8s job
- Followed the steps here
- After just setting the env variables listed above (
MODE=MANUAL
,MANUAL_RUN_FOREVER=FALSE
,CONTAINER_ENABLE_SCHEDULING
, andCONTAINER_ENABLE_MONITORING
), it wouldn't start on its own (was waiting on user input). - Adding
/etc/services.available/10-db-backup/run
to the "command" field of the container definition resulted in a "no such file or directory" error. - I took inspiration from this comment which had me add the
['/init', 'backup-now']
commands which worked. - At that point, with all the envs loaded, I was getting a failed connection to my postgres server with
/root/.postgresql/postgresql.crt: Permission denied
being cited as the issue. - I added
PGSSLMODE=require
to the env list, and the error went away. - Now, I am facing the issue where
pg_isready
won't see that the server is ready. In order to debug, I grabbed the command that was being executed in the debug logs and ran it interactively in the pod withpg_isready --host=$DB01_HOST --port=$DB01_PORT --dbname=$DB01_NAME --username=$DB01_USER
and it worked perfectly.
I suspect that this is a permissions issue with how the commands are being executed, but I'm not entirely sure.
I have the following k8s config:
apiVersion: batch/v1
kind: CronJob
metadata:
name: postgres-storage-backup
namespace: mastodon
spec:
schedule: "30 1 * * *"
concurrencyPolicy: Forbid
suspend: false
successfulJobsHistoryLimit: 1
failedJobsHistoryLimit: 1
jobTemplate:
spec:
template:
metadata:
name: postgres-storage-backup
spec:
volumes:
- name: postgres-completion
configMap:
name: postgres-completion
defaultMode: 0500
containers:
- name: postgres-storage-backup
image: tiredofit/db-backup:4.1.3
imagePullPolicy: IfNotPresent
command:
- /init
- backup-now
volumeMounts:
- name: postgres-completion
mountPath: "/script"
env:
- name: DEBUG_MODE
value: "TRUE"
- name: PGSSLMODE
value: "require"
- name: MODE
value: "MANUAL"
- name: MANUAL_RUN_FOREVER
value: "FALSE"
- name: CONTAINER_ENABLE_SCHEDULING
value: "FALSE"
- name: CONTAINER_ENABLE_MONITORING
value: "FALSE"
- name: DEFAULT_POST_SCRIPT
value: "/script/postgres.sh"
- name: DEFAULT_BACKUP_LOCATION
value: 'S3'
- name: DEFAULT_S3_BUCKET
valueFrom:
configMapKeyRef:
name: storage-backup
key: postgres_bucket
- name: DEFAULT_S3_KEY_ID
valueFrom:
configMapKeyRef:
name: storage-backup
key: DEFAULT_S3_KEY_ID
- name: DEFAULT_S3_KEY_SECRET
valueFrom:
configMapKeyRef:
name: storage-backup
key: DEFAULT_S3_KEY_SECRET
- name: DEFAULT_S3_REGION
valueFrom:
configMapKeyRef:
name: storage-backup
key: DEFAULT_S3_REGION
- name: DEFAULT_S3_HOST
valueFrom:
configMapKeyRef:
name: storage-backup
key: DEFAULT_S3_HOST
- name: DB01_TYPE
value: "pgsql"
- name: DB01_HOST
valueFrom:
configMapKeyRef:
name: mastodon-env-tf
key: DB_HOST
- name: DB01_PORT
valueFrom:
configMapKeyRef:
name: mastodon-env-tf
key: DB_PORT
- name: DB01_NAME
valueFrom:
configMapKeyRef:
name: mastodon-env-tf
key: DB_NAME
- name: DB01_USER
valueFrom:
configMapKeyRef:
name: mastodon-env-tf
key: DB_USER
- name: DB01_PASS
valueFrom:
configMapKeyRef:
name: mastodon-env-tf
key: DB_PASS
restartPolicy: OnFailure
successfulJobsHistoryLimit: 1
failedJobsHistoryLimit: 1
What is the expected correct behavior?
pg_isready sees that the server is up and backs it up
Relevant logs and/or screenshots
I've attached the debug logs with everything sensitive scrubbed: private-logs.txt
Environment
- Image version / tag: tiredofit/db-backup:4.1.3
- Host OS: k8s 1.30.2-do.0
Possible fixes
I've spent a fair amount of time debugging this, so I felt like there was just a point where it would be best to track my progress with a bug open