Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Implement backup / restore of Che server #844

Merged
merged 55 commits into from
Jun 22, 2021
Merged

feat: Implement backup / restore of Che server #844

merged 55 commits into from
Jun 22, 2021

Conversation

mmorhun
Copy link
Contributor

@mmorhun mmorhun commented May 24, 2021

What does this PR do?

Short description

This PR implements ability to backup Eclipse Che installation, send data to a backup server and then restore Che from it. It doesn't handle content of user's projects, but backups configuration of user workspaces.
As of now, backup is bound to specific cluster and cannot be restored on another one.

Full description

This PR introduces three new CRDs:

  • CheBackupServerConfiguration
  • CheClusterBackup
  • CheClusterRestore

There is a separate operator controller for backup as well as for restore, so they don't affect main operator functionality and can run in parallel.

Backup controller, when requested, gathers the following information about Eclipse Che installation:

  • Che CR
  • Dumps Che and Keycloaks databases
  • custom CA certificates
  • credentials secrets (ChePostgresSecret, IdentityProviderSecret, IdentityProviderPostgresSecret)
  • Some cluster and Che info

and then encrypts and sends it to the configured backup server.

Also, it is possible to request (by setting useInternalBackupServer in backup CR) from Che operator (backup controller) to deploy internal backup server. It will be deployed in the same namespace as Che server and might be used only to rollback Che installation to the previous state (even if Che gets deleted, however backup server must survive). The deployment is a single container with restic REST backup server inside. Note, this server configuration doesn't have any protection settings (beside data encryption) and designed to be used only inside the cluster.

Restore controller, on user request, restores Che. It might restore Che independently from the fact of existing of previous installation. Restore controller works in the following way:

  • Cleans previous installation, if any: deletes Che CR, secrets, config maps, custom CAs, etc.
  • Restores Che resources: secrets, custom CAs and finally Che CR. This will trigger deploy process in main operator controller which will pick up all the restored from the backup resources. Restore controller waits until deployment of clean Che (but with configuration from the backup) is finished.
  • Restores Che and Keycloak databases from the backuped dumps and patches some values that are bind to specific cluster.
  • Restarts Keycloak to refresh its caches after database restore.

On Kubernetes, it is possible to create Che backup on one cluster and restore it on another. However, it requires to add new ingress domain in restore configuration as it cannot be autodetected on Kubernetes.
In case of Openshift it is not possible to restore on different cluster, but still possible to restore into a different namespace.

Edit: After some discussions and reviews I dropped support for restoring on different cluster.
This is caused by Keycloak, that binds to OAuth user IDs in Openshift cluster. It was decided not to spent time on patching a lot of related values in Keycloak database as Keycloak will be removed in a mid/short term.

To create backup snapshots, encrypt/decrypt them and push/pull to/from a backup server, third party tool called restic is used. It provides encryption, versioning of snapshots (so, it is possible to restore not from the latest backup by setting snapshotId in restore CR, however by default the latest one is used), and support for different backends. In this PR, only 3 types of external backup servers are added:

  • REST
  • SFTP (via SSH)
  • AWS S3 (and compatible servers like minio, etc.)

Backup servers configuration are the same for backup and restore CRs and stored in a separate CR of CheBackupServerConfiguration type.

To trigger backup or restore process, it is required to create a new CR. To perform new backup, new backup CR should be created.

Known bugs

  • Inability to restore on a different Openshift cluster (won't fix)
  • Custom CA certificates are not used when connect to a backup server (will be implemented in a separate PR)
  • SFTP client doesn't check fingerprint of configured backup server
  • Sometimes, after restore, it is needed to reopen private browser window in order to login into restored Che.

Screenshot/screencast of this PR

N/A

What issues does this PR fix or reference?

eclipse-che/che#18703

How to test this PR?

In order to run operator with changes from this PR the following steps are required:

  1. Checkout to the PR branch
  2. There are 2 ways how to apply templates (use only one: a or b):
    2.a. Use templates from this branch, when deploying Che
    2.b. Manually apply resources:
oc apply -f deploy/role.yaml
oc apply -f deploy/crds/org.eclipse.che_checlusterbackups_crd.yaml
oc apply -f deploy/crds/org.eclipse.che_checlusterrestores_crd.yaml

and also manually apply deploy/operator.yaml from the PR branch to create operator deployment.
3. Deploy Che using operator image built from the PR branch:

chectl server:deploy --platform=**** --installer=operator --che-operator-image=mm4eche/che-operator:18703

Testing with internal backup server

  1. Deploy Eclipse Che
  2. Make some changes in its configuration: change Che CR, add custom CA certs, create user workspaces, etc.
  3. Request backup by creating CR of CheClusterBackup type:
apiVersion: org.eclipse.che/v1
kind: CheClusterBackup
metadata:
  name: eclipse-che-backup
spec:
  useInternalBackupServer: true
  1. Wait until backup CR status says that backup successfully finished
  2. Change installation: edit Che CR, create/delete workspaces, etc.
  3. Request restore by creating CR of CheClusterRestore type:
apiVersion: org.eclipse.che/v1
kind: CheClusterRestore
metadata:
  name: eclipse-che-restore
spec:
  backupServerConfigRef: backup-rest-server-configuration
  1. Wait until restore CR status shows that restore successfully finished.
  2. Check installation, that everything is in the state on the moment of backup.

Testing with an external backup server

In this test scenario we'll use REST backup server, but any other supported backup server might be used. Just setup the server and provide correct configuration (if not, Che operator will tell about occurred error in backup CR status).

  1. Setup REST backup server to be available from within the cluster.
  2. Install and configure Eclipse Che, amek some configuration changes, create user workspaces, etc.
  3. Request backup by creating:
    3.1. backup server configuration (adjust values according to your settings):
apiVersion: org.eclipse.che/v1
kind: CheBackupServerConfiguration
metadata:
  name: backup-rest-server-configuration
spec:
  rest:
      hostname: mybackup-rest-server.host.net
      protocol: http
      port: 8000
      repositoryPath: che
      repositoryPasswordSecretRef: secretName
3.2. backup CR:
apiVersion: org.eclipse.che/v1
kind: CheClusterBackup
metadata:
  name: eclipse-che-backup
spec:
  backupServerConfigRef: backup-rest-server-configuration
  1. Wait until backup CR status says that backup successfully finished
  2. Make changes to installation
  3. Request restore by creating restore CR:
apiVersion: org.eclipse.che/v1
kind: CheClusterRestore
metadata:
  name: eclipse-che-restore
spec:
  backupServerConfigRef: backup-rest-server-configuration
  1. Wait until restore finishes
  2. Check that Eclipse Che is in state as on the moment of creating the backup

PR Checklist

As the author of this Pull Request I made sure that:

Reviewers

Reviewers, please comment how you tested the PR when approving it.

@mmorhun
Copy link
Contributor Author

mmorhun commented May 31, 2021

Rebased

@tolusha tolusha changed the title Implement backup / restore of Che server feat: Implement backup / restore of Che server Jun 1, 2021
Dockerfile Outdated Show resolved Hide resolved
deploy/crds/org.eclipse.che_checlusterbackups_crd.yaml Outdated Show resolved Hide resolved
deploy/crds/org.eclipse.che_checlusterrestores_crd.yaml Outdated Show resolved Hide resolved
deploy/operator.yaml Show resolved Hide resolved
deploy/role.yaml Show resolved Hide resolved
olm/update-resources.sh Outdated Show resolved Hide resolved
Copy link
Member

@ibuziuk ibuziuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mmorhun have a design question - is it possible to configure periodic backups somehow?

@mmorhun
Copy link
Contributor Author

mmorhun commented Jun 1, 2021

@ibuziuk directly in the backup CR or Che - no. But one can setup a cron job in a Kubernetes cluster which will trigger backup in the CR.
Alternatively, we could think about implementation on operator side as an improvement later.

Copy link
Member

@sleshchenko sleshchenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not able to do a detailed review but I hope you'll find useful left minor comments.

deploy/crds/org.eclipse.che_checlusterbackups_crd.yaml Outdated Show resolved Hide resolved
deploy/crds/org.eclipse.che_checlusterbackups_crd.yaml Outdated Show resolved Hide resolved
pkg/apis/org/v1/backuprestorecommon_types.go Outdated Show resolved Hide resolved
pkg/backup_servers/sftp_server.go Show resolved Hide resolved
pkg/backup_servers/util.go Outdated Show resolved Hide resolved
pkg/backup_servers/util.go Outdated Show resolved Hide resolved
pkg/controller/add_checlusterrestore.go Show resolved Hide resolved
.github/bin/minikube/test-backup-restore.sh Outdated Show resolved Hide resolved
name: eclipse-che-backup
namespace: ${NAMESPACE}
spec:
triggerNow: true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely sure about this usage.
Does controller trigger backup and reset this field to false?

Copy link
Contributor Author

@mmorhun mmorhun Jun 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, controller triggers backup and when it successfully finished, the flag is set to false.
The flag is useful. It allows to trigger backup without creating entire CR again. Also it allows patching the CR by automated tool to have scheduled backups.

@ibuziuk
Copy link
Member

ibuziuk commented Jun 2, 2021

Manually apply resources:
oc apply -f deploy/role.yaml
oc apply -f deplot/crds/org.eclipse.che_checlusterbackups_crd.yaml
oc apply -f deplot/crds/org.eclipse.che_checlusterrestores_crd.yaml

Just for the record that this trick will not be possible on OSD with dedicated / enhanced dedicated admins (no permissions for manual CRD creation).

@ibuziuk
Copy link
Member

ibuziuk commented Jun 2, 2021

Request backup by creating CR of CheClusterBackup type:

this CR should be created in the same namespace where the eclipse-che is installed right?

@mmorhun
Copy link
Contributor Author

mmorhun commented Jun 2, 2021

@ibuziuk yes. More general: it should be crated where a Che operator deployment exists. So, you can create a namespace, deploy Che operator manually (using deploy/operator.yaml (but not forget about creating CRDs) ) and create restore CR. It will restore Che (even if it is not installed there yet).

Signed-off-by: Mykola Morhun <mmorhun@redhat.com>
Signed-off-by: Mykola Morhun <mmorhun@redhat.com>
@mmorhun
Copy link
Contributor Author

mmorhun commented Jun 14, 2021

Copy link
Contributor

@tolusha tolusha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well done.

Copy link
Member

@ibuziuk ibuziuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mmorhun indeed great job.
Could you please clarify yet another design / flow question for the following use-case:

  • Eclipse Che is installed in namespace A
  • Need to migrate the workloads to namespace B on the very same cluster

How the backup / restore procedure for this case is going to look like?

@flacatus
Copy link
Contributor

I've finished to test locally the changes and looks really nice!
Awesome job @mmorhun !

@mmorhun
Copy link
Contributor Author

mmorhun commented Jun 15, 2021

@ibuziuk you can do it, but the automatic copying of the backup configuration is not possible and should be done manually.
You need:

  1. Create a backup CR:
apiVersion: org.eclipse.che/v1
kind: CheClusterBackup
metadata:
  name: eclipse-che-backup
spec:
  triggerNow: true
  useInternalBackupServer: true
  1. Wait until backup is finished
  2. Deploy Che operator in target (for new Che) namespace.
    The easiest way to do it is to start a fresh Che installation using chectl, but break it on creating Che CR step (or later to be safe, just delete Che CR manually then).
  3. Copy backup-rest-server-repo-password secret from existing Che namespace to target namespace.
  4. Create restore CR in the target namespace, for example:
apiVersion: org.eclipse.che/v1
kind: CheClusterRestore
metadata:
  name: eclipse-che-restore
spec:
  backupServerConfig:
    rest:
      hostname: backup-rest-server-service.eclipse-che.svc.cluster.local
      port: 8000
      protocol: http
      repositoryPasswordSecretRef: backup-rest-server-repo-password
      repositoryPath: che

Replace eclipse-che in hostname with existing Che installation namespace.

  1. Wait until restore is done.

Also, you may use an external backup server if you wish.

Signed-off-by: Mykola Morhun <mmorhun@redhat.com>
@openshift-ci
Copy link

openshift-ci bot commented Jun 15, 2021

New changes are detected. LGTM label has been removed.

@openshift-ci openshift-ci bot removed the lgtm label Jun 15, 2021
@eclipse-che eclipse-che deleted a comment from openshift-ci bot Jun 15, 2021
…m backup and restore. Delete trigger now.

Signed-off-by: Mykola Morhun <mmorhun@redhat.com>
@openshift-ci
Copy link

openshift-ci bot commented Jun 18, 2021

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: flacatus, mmorhun, tolusha

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Signed-off-by: Mykola Morhun <mmorhun@redhat.com>
Signed-off-by: Mykola Morhun <mmorhun@redhat.com>
@openshift-ci
Copy link

openshift-ci bot commented Jun 22, 2021

@mmorhun: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/v8-che-behind-proxy 51f7f18 link /test v8-che-behind-proxy

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@mmorhun mmorhun merged commit 67dd98d into main Jun 22, 2021
@mmorhun mmorhun deleted the che-18703 branch June 22, 2021 10:24
@che-bot che-bot added this to the 7.33 milestone Jun 22, 2021
@mmorhun
Copy link
Contributor Author

mmorhun commented Jun 22, 2021

@ibuziuk for your use-case:

  1. No need to do any manipulations with templates, everything is merged. Just make sure you have latest version installed.
  2. Create a backup by creating backup CR:
apiVersion: org.eclipse.che/v1
kind: CheClusterBackup
metadata:
  name: eclipse-che-backup
spec:
  useInternalBackupServer: true
  1. Manually copy backup-rest-server-configuration of type CheBackupServerConfiguration to the target namespace and patch service url as describe above.
  2. Request restore by creating restore CR:
apiVersion: org.eclipse.che/v1
kind: CheClusterRestore
metadata:
  name: eclipse-che-restore
spec:
  backupServerConfigRef: backup-rest-server-configuration

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants