Skip to content

Commit

Permalink
[PLAT-12864][PLAT-12865] Add support for Scheduled backups and Increm…
Browse files Browse the repository at this point in the history
…ental backups in Operator

Summary:
Added support for scheduled backups and incremental backups in operator.

**Backup Schedules:**

Sample BackupSchedule CR:

```
apiVersion: operator.yugabyte.io/v1alpha1
kind: BackupSchedule
metadata:
  name: operator-scheduled-backup-1
spec:
  backupType: PGSQL_TABLE_TYPE
  storageConfig: s3-config-operator
  universe: operator-universe-test-2
  timeBeforeDelete: 1234567890
  keyspace: test
  schedulingFrequency: 3600000
  incrementalBackupFrequency: 900000
```

Implementation details:

Backup schedules support taking full backups using cron expressions/frequency based
and also support taking incremental backups in between the full backups.

When a backup currently triggered belongs to a schedule, we create corresponding CR
for the corresponding backup and name it appropriately. The CRs are marked with
"ignore-reconciler-add" to prevent reconciler add trying to handle them.

Schedules have owner references to the universe. When the source universe is
removed, the schedule also receives a delete call.

Schedule actions are retry-able. I made use of the OperatorWorkQueue and custom
reconciler to achieve this, much like the YBUniverse does.

BackupSchedule CR also supports the "enable PointInTimeRestore" feature.

**Incremental Backups**

Sample CR:

```
apiVersion: operator.yugabyte.io/v1alpha1
kind: Backup
metadata:
  name: operator-backup-1
spec:
  backupType: PGSQL_TABLE_TYPE
  storageConfig: az-config-operator-1
  universe: operator-universe-test-1
  timeBeforeDelete: 1234567890
  keyspace: test
  incrementalBackupBase: <base full backup>
```

Implementation details:

Incremental backups have owner references to the previous backup in the
backup chain be it full/incremental backup. Thus they form a chain of references
ending at the first full backup.

Whenever an incremental backup is added, it is appended over the last successful
backup( incremental/full ) in that chain.

When an incremental is deleted, it will fail since we don't allow deleting backups in the chain
unless they are failed, the same behavior is maintained here. To delete the backups, we can
delete the first full backup which triggers a chain of deletes.

**Misc**
- There were multiple issues with backups deletion, have fixed them.
- Added generic methods to handle multiple resource types.
- Did some logic changes wherever necessary to accommodate the current changes.
- Added handler class for Schedule.

Test Plan:
**Verified multiple scenarios:**

- Adding scheduled backups with retry
{F336728}

- Schedules have owner references

```
apiVersion: operator.yugabyte.io/v1alpha1
kind: BackupSchedule
metadata:
  annotations:
    universeUUID: f2d646b4-6158-4d30-b088-c116a14142bb
  creationTimestamp: "2025-02-27T06:42:08Z"
  finalizers:
  - finalizer.k8soperator.yugabyte.com
  generation: 1
  name: operator-scheduled-backup-1
  namespace: schedule-cr
  ownerReferences:
  - apiVersion: operator.yugabyte.io/v1alpha1
    blockOwnerDeletion: true
    kind: YBUniverse
    name: operator-universe-test-2
    uid: fee37102-f770-49fe-a740-c9562fa290d6
  resourceVersion: "1047507613"
  uid: a6402732-b91a-4454-a873-89c2131064b3
```

- Schedules are removed when Universe is removed

```
[kv83821@dev-server-kv83821 operator-crs]$ kubectl get backupschedule -n schedule-cr
NAME                          AGE
operator-scheduled-backup-1   101m
[kv83821@dev-server-kv83821 operator-crs]$ kubectl get ybuniverse -n schedule-cr
NAME                       STATE   SOFTWARE VERSION
operator-universe-test-2   Ready   2.25.2.0-b40
[kv83821@dev-server-kv83821 operator-crs]$ kubectl delete ybuniverse operator-universe-test-2 -n schedule-cr
ybuniverse.operator.yugabyte.io "operator-universe-test-2" deleted
[kv83821@dev-server-kv83821 operator-crs]$ kubectl get backupschedule -n schedule-cr
No resources found in schedule-cr namespace.
```

- Creating full and incremental backups with schedule( auto created CRs )

```
[kv83821@dev-server-kv83821 operator-crs]$ kubectl get backups -n schedule-cr
NAME                                                                     AGE
operator-scheduled-backup-1-1069296176-full-2025-02-27-06-43-25          32m
operator-scheduled-backup-1-1069296176-incremental-2025-02-27-06-59-26   16m
operator-scheduled-backup-1-1069296176-incremental-2025-02-27-07-13-26   2m55s
```

- Deleting full backup deletes full and and all incremental backups

```
[kv83821@dev-server-kv83821 operator-crs]$ kubectl get backups -n schedule-cr
NAME                                                                     AGE
operator-scheduled-backup-1-1069296176-full-2025-02-27-06-43-25          32m
operator-scheduled-backup-1-1069296176-incremental-2025-02-27-06-59-26   16m
operator-scheduled-backup-1-1069296176-incremental-2025-02-27-07-13-26   2m55s
[kv83821@dev-server-kv83821 operator-crs]$ kubectl delete backup operator-scheduled-backup-1-1069296176-full-2025-02-27-06-43-25 -n schedule-cr
backup.operator.yugabyte.io "operator-scheduled-backup-1-1069296176-full-2025-02-27-06-43-25" deleted
[kv83821@dev-server-kv83821 operator-crs]$ kubectl get backups -n schedule-cr
No resources found in schedule-cr namespace.
```

- Tested Edit schedule workflow works as expected. Also verified bad schedules keep retrying until the correct schedule params are applied.
- Added Unit tests

Reviewers: anijhawan, dshubin

Reviewed By: anijhawan

Differential Revision: https://phorge.dev.yugabyte.com/D42204
  • Loading branch information
kv83821-yb committed Mar 5, 2025
1 parent cca7ab9 commit 4bb5f51
Show file tree
Hide file tree
Showing 36 changed files with 2,153 additions and 406 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@
import com.yugabyte.yw.common.metrics.MetricLabelsBuilder;
import com.yugabyte.yw.common.operator.OperatorStatusUpdater;
import com.yugabyte.yw.common.operator.OperatorStatusUpdaterFactory;
import com.yugabyte.yw.common.operator.utils.OperatorUtils;
import com.yugabyte.yw.forms.BackupRequestParams;
import com.yugabyte.yw.models.Backup;
import com.yugabyte.yw.models.Backup.BackupCategory;
Expand Down Expand Up @@ -66,19 +67,22 @@ public class CreateBackup extends UniverseTaskBase {
private final YbcManager ybcManager;
private final StorageUtilFactory storageUtilFactory;
private final OperatorStatusUpdater kubernetesStatus;
private final OperatorUtils operatorUtils;

@Inject
protected CreateBackup(
BaseTaskDependencies baseTaskDependencies,
CustomerConfigService customerConfigService,
YbcManager ybcManager,
StorageUtilFactory storageUtilFactory,
OperatorStatusUpdaterFactory operatorStatusUpdaterFactory) {
OperatorStatusUpdaterFactory operatorStatusUpdaterFactory,
OperatorUtils operatorUtils) {
super(baseTaskDependencies);
this.customerConfigService = customerConfigService;
this.ybcManager = ybcManager;
this.storageUtilFactory = storageUtilFactory;
this.kubernetesStatus = operatorStatusUpdaterFactory.create();
this.operatorUtils = operatorUtils;
}

protected BackupRequestParams params() {
Expand Down Expand Up @@ -156,6 +160,13 @@ public void run() {
ybcBackup,
tablesToBackup);
log.info("Task id {} for the backup {}", backup.getTaskUUID(), backup.getBackupUUID());
if (params().scheduleUUID != null && params().getKubernetesResourceDetails() != null) {
try {
operatorUtils.createBackupCr(backup);
} catch (Exception e) {
throw new RuntimeException(e);
}
}

// Marks the update of this universe as a success only if all the tasks before it succeeded.
createMarkUniverseUpdateSuccessTasks()
Expand Down Expand Up @@ -305,5 +316,8 @@ public void runScheduledBackup(
SCHEDULED_BACKUP_SUCCESS_COUNTER.labels(metricLabelsBuilder.getPrometheusValues()).inc();
metricService.setOkStatusMetric(
buildMetricTemplate(PlatformMetrics.SCHEDULE_BACKUP_STATUS, universe));
// Update Kubernetes operator schedule status
kubernetesStatus.updateBackupScheduleStatus(
taskParams.getKubernetesResourceDetails(), schedule);
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ public void validateParams(boolean isFirstTry) {
super.validateParams(isFirstTry);
taskParams()
.scheduleParams
.validateExistingSchedule(isFirstTry, taskParams().getCustomerUUID());
.validateExistingSchedule(taskParams().getCustomerUUID(), isFirstTry);
if (isFirstTry) {
Universe universe = getUniverse();
taskParams().scheduleParams.validateScheduleParams(backupHelper, universe);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
package com.yugabyte.yw.commissioner.tasks;

import com.yugabyte.yw.commissioner.BaseTaskDependencies;
import com.yugabyte.yw.commissioner.ITask.Retryable;
import com.yugabyte.yw.common.backuprestore.ybc.YbcManager;
import com.yugabyte.yw.common.customer.config.CustomerConfigService;
import com.yugabyte.yw.common.operator.OperatorStatusUpdaterFactory;
Expand All @@ -19,6 +20,7 @@
import lombok.extern.slf4j.Slf4j;

@Slf4j
@Retryable
public class CreateBackupScheduleKubernetes extends BackupScheduleBaseKubernetes {

private final CustomerConfigService customerConfigService;
Expand All @@ -40,7 +42,7 @@ public void validateParams(boolean isFirstTry) {
super.validateParams(isFirstTry);
taskParams()
.scheduleParams
.validateExistingSchedule(isFirstTry, taskParams().getCustomerUUID());
.validateExistingSchedule(taskParams().getCustomerUUID(), isFirstTry);
if (isFirstTry) {
Universe universe = getUniverse();
taskParams().scheduleParams.validateScheduleParams(backupHelper, universe);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,15 @@

import com.google.inject.Inject;
import com.yugabyte.yw.commissioner.BaseTaskDependencies;
import com.yugabyte.yw.commissioner.ITask.Retryable;
import com.yugabyte.yw.common.operator.OperatorStatusUpdaterFactory;
import com.yugabyte.yw.forms.BackupRequestParams;
import com.yugabyte.yw.models.Schedule;
import lombok.extern.slf4j.Slf4j;
import play.libs.Json;

@Slf4j
@Retryable
public class EditBackupScheduleKubernetes extends BackupScheduleBaseKubernetes {

@Inject
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -184,6 +184,7 @@
import com.yugabyte.yw.common.gflags.GFlagsUtil;
import com.yugabyte.yw.common.gflags.SpecificGFlags;
import com.yugabyte.yw.common.nodeui.DumpEntitiesResponse;
import com.yugabyte.yw.common.operator.KubernetesOperatorStatusUpdater;
import com.yugabyte.yw.forms.BackupRequestParams;
import com.yugabyte.yw.forms.BackupTableParams;
import com.yugabyte.yw.forms.BulkImportParams;
Expand Down Expand Up @@ -6544,21 +6545,41 @@ protected void addAllCreateBackupScheduleTasks(
BackupRequestParams scheduleParams,
UUID customerUUID,
String stableYbcVersion) {
addAllCreateBackupScheduleTasks(
backupScheduleSubTasks,
scheduleParams,
customerUUID,
stableYbcVersion,
null /* kubernetesStatus */);
}

protected void addAllCreateBackupScheduleTasks(
Runnable backupScheduleSubTasks,
BackupRequestParams scheduleParams,
UUID customerUUID,
String stableYbcVersion,
@Nullable KubernetesOperatorStatusUpdater kubernetesStatus) {
Universe universe = getUniverse();
Schedule schedule = null;

// Lock universe
lockAndFreezeUniverseForUpdate(
universe.getUniverseUUID(), universe.getVersion(), null /* firstRunTxnCallback */);
try {
// Get or create schedule
// Create schedule
schedule = Schedule.getOrCreateSchedule(customerUUID, scheduleParams);
UUID scheduleUUID = schedule.getScheduleUUID();
log.info(
"Creating backup schedule for customer {}, schedule uuid = {}.",
scheduleParams.customerUUID,
scheduleUUID);

// Update kubernetes status
if (kubernetesStatus != null) {
kubernetesStatus.updateBackupScheduleStatus(
scheduleParams.getKubernetesResourceDetails(), schedule);
}

boolean ybcBackup =
!BackupCategory.YB_BACKUP_SCRIPT.equals(scheduleParams.backupCategory)
&& universe.isYbcEnabled()
Expand Down Expand Up @@ -6599,13 +6620,23 @@ protected void addAllCreateBackupScheduleTasks(
getRunnableTask().runSubTasks();

// Mark schedule Active
Schedule.updateStatusAndSave(customerUUID, scheduleUUID, Schedule.State.Active);
schedule = Schedule.updateStatusAndSave(customerUUID, scheduleUUID, Schedule.State.Active);

if (kubernetesStatus != null) {
kubernetesStatus.updateBackupScheduleStatus(
scheduleParams.getKubernetesResourceDetails(), schedule);
}
} catch (Throwable t) {
log.error("Error executing task {} with error='{}'.", getName(), t.getMessage(), t);
// Update schedule state to Error
if (schedule != null) {
Schedule.updateStatusAndSave(
customerUUID, schedule.getScheduleUUID(), Schedule.State.Error);
schedule =
Schedule.updateStatusAndSave(
customerUUID, schedule.getScheduleUUID(), Schedule.State.Error);
if (kubernetesStatus != null) {
kubernetesStatus.updateBackupScheduleStatus(
scheduleParams.getKubernetesResourceDetails(), schedule);
}
}
throw t;
} finally {
Expand All @@ -6619,6 +6650,20 @@ protected void addAllEditBackupScheduleTasks(
BackupRequestParams scheduleParams,
UUID customerUUID,
UUID scheduleUUID) {
addAllEditBackupScheduleTasks(
backupScheduleSubTasks,
scheduleParams,
customerUUID,
scheduleUUID,
null /* kubernetesStatus */);
}

protected void addAllEditBackupScheduleTasks(
Runnable backupScheduleSubTasks,
BackupRequestParams scheduleParams,
UUID customerUUID,
UUID scheduleUUID,
@Nullable KubernetesOperatorStatusUpdater kubernetesStatus) {
Schedule schedule = Schedule.getOrBadRequest(customerUUID, scheduleUUID);
Universe universe = getUniverse();
// Lock schedule
Expand All @@ -6637,8 +6682,14 @@ protected void addAllEditBackupScheduleTasks(
customerUUID,
scheduleUUID);
// Modify params and set state to Editing
Schedule.updateNewBackupScheduleTimeAndStatusAndSave(
customerUUID, scheduleUUID, State.Editing, scheduleParams);
schedule =
Schedule.updateNewBackupScheduleTimeAndStatusAndSave(
customerUUID, scheduleUUID, State.Editing, scheduleParams);

if (kubernetesStatus != null) {
kubernetesStatus.updateBackupScheduleStatus(
scheduleParams.getKubernetesResourceDetails(), schedule);
}

if (scheduleParams.enablePointInTimeRestore) {
backupScheduleSubTasks.run();
Expand All @@ -6648,13 +6699,22 @@ protected void addAllEditBackupScheduleTasks(
getRunnableTask().runSubTasks();
}
// Mark schedule Active
Schedule.updateStatusAndSave(customerUUID, scheduleUUID, Schedule.State.Active);
schedule = Schedule.updateStatusAndSave(customerUUID, scheduleUUID, Schedule.State.Active);
if (kubernetesStatus != null) {
kubernetesStatus.updateBackupScheduleStatus(
scheduleParams.getKubernetesResourceDetails(), schedule);
}
} catch (Throwable t) {
log.error("Error executing task {} with error='{}'.", getName(), t.getMessage(), t);
// Update schedule state to Error
if (schedule != null) {
Schedule.updateStatusAndSave(
customerUUID, schedule.getScheduleUUID(), Schedule.State.Error);
schedule =
Schedule.updateStatusAndSave(
customerUUID, schedule.getScheduleUUID(), Schedule.State.Error);
if (kubernetesStatus != null) {
kubernetesStatus.updateBackupScheduleStatus(
scheduleParams.getKubernetesResourceDetails(), schedule);
}
}
throw t;
} finally {
Expand All @@ -6673,7 +6733,26 @@ protected void addAllDeleteBackupScheduleTasks(
BackupRequestParams scheduleParams,
UUID customerUUID,
UUID scheduleUUID) {
Schedule schedule = Schedule.getOrBadRequest(customerUUID, scheduleUUID);
addAllDeleteBackupScheduleTasks(
backupScheduleSubTasks,
scheduleParams,
customerUUID,
scheduleUUID,
null /* kubernetesStatus */);
}

protected void addAllDeleteBackupScheduleTasks(
Runnable backupScheduleSubTasks,
BackupRequestParams scheduleParams,
UUID customerUUID,
UUID scheduleUUID,
@Nullable KubernetesOperatorStatusUpdater kubernetesStatus) {
Optional<Schedule> optSchedule = Schedule.maybeGet(customerUUID, scheduleUUID);
if (!optSchedule.isPresent()) {
log.info("Schedule already deleted!");
return;
}
Schedule schedule = optSchedule.get();
Universe universe = getUniverse();
// Lock schedule
// Ok to fail, don't put this inside try block.
Expand All @@ -6690,7 +6769,12 @@ protected void addAllDeleteBackupScheduleTasks(
"Deleting backup schedule for customer {}, schedule uuid = {}.",
customerUUID,
scheduleUUID);
Schedule.updateStatusAndSave(customerUUID, scheduleUUID, State.Deleting);
schedule = Schedule.updateStatusAndSave(customerUUID, scheduleUUID, State.Deleting);

if (kubernetesStatus != null) {
kubernetesStatus.updateBackupScheduleStatus(
scheduleParams.getKubernetesResourceDetails(), schedule);
}

if (scheduleParams.enablePointInTimeRestore) {
backupScheduleSubTasks.run();
Expand All @@ -6708,8 +6792,13 @@ protected void addAllDeleteBackupScheduleTasks(
log.error("Error executing task {} with error='{}'.", getName(), t.getMessage(), t);
// Update schedule state to Error
if (schedule != null) {
Schedule.updateStatusAndSave(
customerUUID, schedule.getScheduleUUID(), Schedule.State.Error);
schedule =
Schedule.updateStatusAndSave(
customerUUID, schedule.getScheduleUUID(), Schedule.State.Error);
if (kubernetesStatus != null) {
kubernetesStatus.updateBackupScheduleStatus(
scheduleParams.getKubernetesResourceDetails(), schedule);
}
}
throw t;
} finally {
Expand Down
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
// Copyright (c) YugaByte, Inc.

package com.yugabyte.yw.common.backuprestore;

import static com.yugabyte.yw.common.Util.getUUIDRepresentation;
Expand Down
Loading

0 comments on commit 4bb5f51

Please sign in to comment.