Skip to content

[Feature] Retry full backups to avoid false positives in alerts #258

Closed
@amshuman-kr

Description

Feature (What you would like to be added):
If regular full snapshot backup fails, they are retried only in the next interval (typically, 24h). We should retry the full snapshot backup in shorter time frame (say, 10m, 15m or 20m).

Motivation (Why is this needed?):
Alerts are configured to fire if the latest full backups are more than 24h old. These typically get resolved automatically in the next interval when the full backup goes through. So, most of the alerts are false positives that get resolved automatically. This makes it hard to automate the follow up process using ticketing systems (mandated by audit).

Retrying within a range of 10m-20m might resolve the issue automatically earlier so that alerts fire only if retries also fail.

Approach/Hint to the implement solution (optional):

Metadata

Assignees

Labels

kind/enhancementEnhancement, improvement, extension

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions