-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Implement backup / restore of Che server #844
Conversation
Rebased |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mmorhun have a design question - is it possible to configure periodic backups somehow?
@ibuziuk directly in the backup CR or Che - no. But one can setup a cron job in a Kubernetes cluster which will trigger backup in the CR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not able to do a detailed review but I hope you'll find useful left minor comments.
name: eclipse-che-backup | ||
namespace: ${NAMESPACE} | ||
spec: | ||
triggerNow: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely sure about this usage.
Does controller trigger backup and reset this field to false?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, controller triggers backup and when it successfully finished, the flag is set to false
.
The flag is useful. It allows to trigger backup without creating entire CR again. Also it allows patching the CR by automated tool to have scheduled backups.
Just for the record that this trick will not be possible on OSD with dedicated / enhanced dedicated admins (no permissions for manual CRD creation). |
this CR should be created in the same namespace where the |
@ibuziuk yes. More general: it should be crated where a Che operator deployment exists. So, you can create a namespace, deploy Che operator manually (using |
Signed-off-by: Mykola Morhun <mmorhun@redhat.com>
Signed-off-by: Mykola Morhun <mmorhun@redhat.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mmorhun indeed great job.
Could you please clarify yet another design / flow question for the following use-case:
- Eclipse Che is installed in namespace A
- Need to migrate the workloads to namespace B on the very same cluster
How the backup / restore procedure for this case is going to look like?
I've finished to test locally the changes and looks really nice! |
@ibuziuk you can do it, but the automatic copying of the backup configuration is not possible and should be done manually.
apiVersion: org.eclipse.che/v1
kind: CheClusterBackup
metadata:
name: eclipse-che-backup
spec:
triggerNow: true
useInternalBackupServer: true
apiVersion: org.eclipse.che/v1
kind: CheClusterRestore
metadata:
name: eclipse-che-restore
spec:
backupServerConfig:
rest:
hostname: backup-rest-server-service.eclipse-che.svc.cluster.local
port: 8000
protocol: http
repositoryPasswordSecretRef: backup-rest-server-repo-password
repositoryPath: che Replace
Also, you may use an external backup server if you wish. |
Signed-off-by: Mykola Morhun <mmorhun@redhat.com>
New changes are detected. LGTM label has been removed. |
…m backup and restore. Delete trigger now. Signed-off-by: Mykola Morhun <mmorhun@redhat.com>
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: flacatus, mmorhun, tolusha The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Signed-off-by: Mykola Morhun <mmorhun@redhat.com>
Signed-off-by: Mykola Morhun <mmorhun@redhat.com>
@mmorhun: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
@ibuziuk for your use-case:
apiVersion: org.eclipse.che/v1
kind: CheClusterBackup
metadata:
name: eclipse-che-backup
spec:
useInternalBackupServer: true
apiVersion: org.eclipse.che/v1
kind: CheClusterRestore
metadata:
name: eclipse-che-restore
spec:
backupServerConfigRef: backup-rest-server-configuration |
What does this PR do?
Short description
This PR implements ability to backup Eclipse Che installation, send data to a backup server and then restore Che from it. It doesn't handle content of user's projects, but backups configuration of user workspaces.
As of now, backup is bound to specific cluster and cannot be restored on another one.
Full description
This PR introduces three new CRDs:
CheBackupServerConfiguration
CheClusterBackup
CheClusterRestore
There is a separate operator controller for backup as well as for restore, so they don't affect main operator functionality and can run in parallel.
Backup controller, when requested, gathers the following information about Eclipse Che installation:
and then encrypts and sends it to the configured backup server.
Also, it is possible to request (by setting
useInternalBackupServer
in backup CR) from Che operator (backup controller) to deploy internal backup server. It will be deployed in the same namespace as Che server and might be used only to rollback Che installation to the previous state (even if Che gets deleted, however backup server must survive). The deployment is a single container with restic REST backup server inside. Note, this server configuration doesn't have any protection settings (beside data encryption) and designed to be used only inside the cluster.Restore controller, on user request, restores Che. It might restore Che independently from the fact of existing of previous installation. Restore controller works in the following way:
On Kubernetes, it is possible to create Che backup on one cluster and restore it on another. However, it requires to add new ingress domain in restore configuration as it cannot be autodetected on Kubernetes.In case of Openshift it is not possible to restore on different cluster, but still possible to restore into a different namespace.
Edit: After some discussions and reviews I dropped support for restoring on different cluster.
This is caused by Keycloak, that binds to OAuth user IDs in Openshift cluster. It was decided not to spent time on patching a lot of related values in Keycloak database as Keycloak will be removed in a mid/short term.
To create backup snapshots, encrypt/decrypt them and push/pull to/from a backup server, third party tool called
restic
is used. It provides encryption, versioning of snapshots (so, it is possible to restore not from the latest backup by settingsnapshotId
in restore CR, however by default the latest one is used), and support for different backends. In this PR, only 3 types of external backup servers are added:Backup servers configuration are the same for backup and restore CRs and stored in a separate CR of
CheBackupServerConfiguration
type.To trigger backup or restore process, it is required to create a new CR. To perform new backup, new backup CR should be created.
Known bugs
Screenshot/screencast of this PR
N/A
What issues does this PR fix or reference?
eclipse-che/che#18703
How to test this PR?
In order to run operator with changes from this PR the following steps are required:
2.a. Use templates from this branch, when deploying Che
2.b. Manually apply resources:
and also manually apply
deploy/operator.yaml
from the PR branch to create operator deployment.3. Deploy Che using operator image built from the PR branch:
chectl server:deploy --platform=**** --installer=operator --che-operator-image=mm4eche/che-operator:18703
Testing with internal backup server
CheClusterBackup
type:CheClusterRestore
type:Testing with an external backup server
In this test scenario we'll use REST backup server, but any other supported backup server might be used. Just setup the server and provide correct configuration (if not, Che operator will tell about occurred error in backup CR status).
3.1. backup server configuration (adjust values according to your settings):
PR Checklist
As the author of this Pull Request I made sure that:
What issues does this PR fix or reference
andHow to test this PR
completedReviewers
Reviewers, please comment how you tested the PR when approving it.