feat: startup and readiness probes for replicas#6623
feat: startup and readiness probes for replicas#6623gbartolini merged 15 commits intocloudnative-pg:mainfrom
Conversation
|
❗ By default, the pull request is configured to backport to all release branches.
|
3df3ab1 to
e62c4be
Compare
ba89383 to
c756b27
Compare
|
IIUC, when replicas re-join they need to catch-up WAL - which has two effects:
+1 using a startup probe seems like a good idea if someone is choosing my initial thought is that this should default to a small number of bytes and might not need to be configurable at all, unless as a setting for debugging or troubleshooting. this would result in a slightly longer delay for replicas becoming ready even with dataDurability= note: what i've written above is based on my understanding but i have not had time to test or verify it, so there might be mistakes. i haven't looked yet, but it could also be interesting to check how patroni approaches this. |
That's correct. |
|
Is this PR in patroni solving a similar problem? |
|
interesting - from this recent pgcon talk it sounds like actually patroni might even still have the pause https://youtu.be/CWrFPPG5USA?feature=shared&t=1190 i think in the talk he's focused on a 3rd node rejoining |
Yes. I think there is no solution this problem but only compromises. |
Allow the user to configure the behavior of the startup probe, to be choosen from the following list: 1. pg_isready 2. streaming, with optional lag limit Closes: cloudnative-pg#6621 Signed-off-by: Leonardo Cecchi <leonardo.cecchi@enterprisedb.com>
Signed-off-by: Gabriele Bartolini <gabriele.bartolini@enterprisedb.com>
Signed-off-by: Leonardo Cecchi <leonardo.cecchi@enterprisedb.com>
Signed-off-by: Leonardo Cecchi <leonardo.cecchi@gmail.com>
Signed-off-by: Leonardo Cecchi <leonardo.cecchi@gmail.com>
Signed-off-by: Leonardo Cecchi <leonardo.cecchi@enterprisedb.com>
Signed-off-by: Gabriele Bartolini <gabriele.bartolini@enterprisedb.com>
Signed-off-by: Leonardo Cecchi <leonardo.cecchi@enterprisedb.com>
Signed-off-by: Armando Ruocco <armando.ruocco@enterprisedb.com>
Signed-off-by: Armando Ruocco <armando.ruocco@enterprisedb.com>
Signed-off-by: Armando Ruocco <armando.ruocco@enterprisedb.com>
Signed-off-by: Niccolò Fei <niccolo.fei@enterprisedb.com>
Signed-off-by: Niccolò Fei <niccolo.fei@enterprisedb.com>
Signed-off-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com>
|
/test d=main tl=4 |
|
@mnencia, here's the link to the E2E on CNPG workflow run: https://github.com/cloudnative-pg/cloudnative-pg/actions/runs/13763633508 |
Signed-off-by: Gabriele Bartolini <gabriele.bartolini@enterprisedb.com>
When upgrading to 1.26 from previous version, PostgreSQL clusters are restarted (even with in-place update enabled) due to change into the Startup probe definition. This issue appears to be a side effect of the improvements made to the startup probe: cloudnative-pg#6623
When upgrading to 1.26 from previous version, PostgreSQL clusters are restarted (even with in-place update enabled) due to change into the Startup probe definition. This issue appears to be a side effect of the improvements made to the startup probe: cloudnative-pg#6623 Signed-off-by: Julian Vanden Broeck <julian.vandenbroeck@dalibo.com>
When upgrading to 1.26 from previous version, PostgreSQL clusters are restarted (even with in-place update enabled) due to change into the Startup probe definition. This issue appears to be a side effect of the improvements made to the startup probe: cloudnative-pg#6623 Signed-off-by: Julian Vanden Broeck <julian.vandenbroeck@dalibo.com>
When upgrading to 1.26 from previous version, PostgreSQL clusters are restarted (even with in-place update enabled) due to change into the Startup probe definition. This issue appears to be a side effect of the improvements made to the startup probe: cloudnative-pg#6623 Signed-off-by: Julian Vanden Broeck <julian.vandenbroeck@dalibo.com>
…8018) When upgrading from a previous version to 1.26, PostgreSQL clusters will be restarted even with in-place updates enabled, due to changes in the Startup probe definition (PR #6623). Closes #7727 Signed-off-by: Julian Vanden Broeck <julian.vandenbroeck@dalibo.com> Signed-off-by: Gabriele Bartolini <gabriele.bartolini@enterprisedb.com> Co-authored-by: Julian Vanden Broeck <julian.vandenbroeck@dalibo.com> Co-authored-by: Gabriele Bartolini <gabriele.bartolini@enterprisedb.com>
…8018) When upgrading from a previous version to 1.26, PostgreSQL clusters will be restarted even with in-place updates enabled, due to changes in the Startup probe definition (PR #6623). Closes #7727 Signed-off-by: Julian Vanden Broeck <julian.vandenbroeck@dalibo.com> Signed-off-by: Gabriele Bartolini <gabriele.bartolini@enterprisedb.com> Co-authored-by: Julian Vanden Broeck <julian.vandenbroeck@dalibo.com> Co-authored-by: Gabriele Bartolini <gabriele.bartolini@enterprisedb.com> (cherry picked from commit 48ddea1)
Extend the startup and readiness probes configured through the
.spec.probes.startupand.spec.probes.readinesssections by adding two additional parameters:type: Defines the criteria for considering the probe successful. Accepted values include:pg_isready: This setting marks the probe as successful when thepg_isreadycommand exits with a status of0. This is the default for both primary instances and replicas.query: This setting marks the probe as successful when a basic query is executed locally on thepostgresdatabase.streaming: This setting marks the probe successful when the replica starts streaming from its source and meets the specified lag requirements (details below).lag: Specifies the maximum acceptable replication lag, measured in bytes (expressed using Kubernetes quantities). This parameter is applicable only whentypeis set tostreaming. If thelagparameter is not specified, the replica is considered successfully started/ready as soon as it begins streaming.Consequently, the liveness probe has been streamlined to verify solely that the instance manager is operational, without monitoring the underlying PostgreSQL instance.
Closes: #6621
Release Notes
Improved Startup and Readiness Probes for Replicas: Enhanced support for Kubernetes startup and readiness probes in PostgreSQL instances, providing greater control over replicas based on the streaming lag.