Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix missing alternate full snapshots #272

Merged

Conversation

shreyas-s-rao
Copy link
Collaborator

Signed-off-by: Shreyas Rao shreyas.sriganesh.rao@sap.com

What this PR does / why we need it:
This PR fixes missing full snapshots on shoot unhibernation. Setting the previous full snapshot existence check to 23.5 hours provides a buffer of 30 minutes for the pod to become ready to take the first full snapshot everyday in the case where full snapshot schedule lies within the shoot's hibernation period.

Which issue(s) this PR fixes:
Fixes #258

Special notes for your reviewer:

Release note:

Fix missing alternate full snapshots for some unhibernating shoots.

Signed-off-by: Shreyas Rao <shreyas.sriganesh.rao@sap.com>
@shreyas-s-rao shreyas-s-rao added kind/bug Bug component/etcd-backup-restore ETCD Backup & Restore area/backup Backup related labels Oct 14, 2020
@gardener-robot gardener-robot added needs/review Needs review size/xs Size of pull request is tiny (see gardener-robot robot/bots/size.py) labels Oct 14, 2020
@gardener-robot-ci-3 gardener-robot-ci-3 added reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) reviewed/ok-to-test-tm Has approval for running integration tests on TestMachinery labels Oct 14, 2020
@gardener-robot-ci-1 gardener-robot-ci-1 added needs/ok-to-test-tm Requires integration tests to be run on TestMachinery and removed reviewed/ok-to-test-tm Has approval for running integration tests on TestMachinery labels Oct 14, 2020
@gardener-robot-ci-3 gardener-robot-ci-3 added needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Oct 14, 2020
@amshuman-kr amshuman-kr added this to the v0.11.0 milestone Oct 19, 2020
@amshuman-kr
Copy link
Collaborator

Thanks for the PR, @shreyas-s-rao! The change itself looks good. But as discussed elsewhere, how about addressing this by tweaking the gardener to consider hibernation schedule while computing the backup schedule string? It is already considering maintenance schedule.

@shreyas-s-rao
Copy link
Collaborator Author

@amshuman-kr as discussed, we can go ahead with this PR as it's a quick win, while we do acknowledge that it's only a temporary solution, and we need a more generic approach to determine whether a "recent enough" previous full snapshot was taken, according to the given cron schedule.

Copy link
Collaborator

@amshuman-kr amshuman-kr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Have you tested the case? I haven't :-)

@gardener-robot gardener-robot removed the needs/review Needs review label Oct 21, 2020
@shreyas-s-rao
Copy link
Collaborator Author

@amshuman-kr I have tested this locally with a helm chart, where I simulated a delayed startup (by 10 minutes) of etcd on day 1. Without this PR, the day 2 etcdbr startup missed the full snapshot. With the PR, it was able to account for that 10-minute delay the previous day, and took the full snapshot on etcdbr startup. Not sure if we can have an automated test case for this, since the config values are more Gardener-specific (like 24 hours). Until we automate that, I don't see a way to add a test case for this scenario. I'll open an issue to move away from the hard-coded 24-hour value here towards a more cron-schedule-aware method to calculate the period between two consecutive full snapshots. For the time-being, I think we can go ahead and merge this PR. WDYT?

@amshuman-kr amshuman-kr merged commit 7f75c53 into gardener:master Oct 31, 2020
@shreyas-s-rao shreyas-s-rao deleted the fix/missing-full-snapshot branch November 2, 2020 15:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/backup Backup related component/etcd-backup-restore ETCD Backup & Restore kind/bug Bug needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) needs/ok-to-test-tm Requires integration tests to be run on TestMachinery size/xs Size of pull request is tiny (see gardener-robot robot/bots/size.py)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature] Retry full backups to avoid false positives in alerts
5 participants