Open
Description
Overview
I'm running a trivial CrunchyData instance with 1 primary.
It ran out of disk space possibly due to #2531 but this is not relevant to this issue.
Because of this the postgres pod is stuck in a loop displaying this:
2023-11-27 16:43:12,175 INFO: Lock owner: ; I am postgres-instance1-mqhs-0
2023-11-27 16:43:12,175 INFO: not healthy enough for leader race
2023-11-27 16:43:12,176 INFO: doing crash recovery in a single user mode in progress
i.e. Postgres isn't running at all, I can't connect to it.
The problems are:
- The pod still shows up as healthy. Being unhealthy and restarting wouldn't fix anything in this case but this could be used to trigger some monitors/alerts to highlight that things aren't right.
- The operator logs show no issues at all.
In short, Postgres is broken but the control plane or whatever you want to call it is not aware of it.
Environment
Please provide the following details:
- Platform: k3s
- Platform Version: 1.28
- PGO Image Tag:
ubi8-16.0-3.4-0
- Postgres Version: 16
- Storage: EBS