DR-Syncer provides tools designed to automate and simplify disaster recovery synchronization between Kubernetes clusters.
Organizations running Kubernetes in production face several challenges when establishing and maintaining disaster recovery environments:
-
Manual Configuration Burden
- Time-consuming manual setup of DR environments
- Error-prone resource copying
- Inconsistent state between clusters
-
Resource Management Complexity
- Difficult tracking of which resources need replication
- Challenges with resource version management
- Selective resource synchronization complications
-
Operational Overhead
- Frequent manual intervention required for updates
- Lack of automation in DR processes
- Time-intensive DR maintenance and testing
DR-Syncer addresses these challenges by providing automated synchronization of resources from source namespaces to destination namespaces in remote clusters. It enables reliable disaster recovery setups with minimal operational overhead.
DR-Syncer offers two distinct tools to handle disaster recovery synchronization:
-
Controller: A Kubernetes operator that runs continuously inside your cluster, providing automated and scheduled synchronization
- Ideal for ongoing automation and "set it and forget it" scenarios
- Uses Custom Resource Definitions (CRDs) for configuration
- Supports continuous, scheduled, and manual synchronization modes
-
CLI: A standalone command-line tool for direct, on-demand synchronization operations
- Does not require deploying anything to your clusters
- Perfect for manual operations, testing, or one-off scenarios
- Supports Stage, Cutover, and Failback operations with a single command
- Ideal for organizations that prefer not to deploy additional controllers
Both tools maintain feature parity for the core synchronization capabilities, but are used in different contexts and deployment models.
DR-Syncer follows the Kubernetes operator pattern:
-
Custom Resources
RemoteCluster
: Defines remote cluster configuration and authenticationReplication
: Defines synchronization configuration and resource filtering
-
Reconciliation Flow
- Controller watches for custom resource changes
- Compares desired state with actual state in remote clusters
- Executes synchronization operations when discrepancies are found
- Updates status and metrics
-
Synchronization Process
- Resources are filtered based on type and exclusion rules
- Resources are transformed as needed (e.g., scaling deployments to zero)
- Resources are applied to remote clusters
- Status is updated with synchronization results
- Synchronizes multiple resource types (ConfigMaps, Secrets, Deployments, Services, Ingresses, PVCs)
- Maintains resource state and metadata across clusters
- Handles immutable fields and resource versions
- Zero replicas in DR cluster by default (preventing resource consumption)
- Scale override capability via labels (
dr-syncer.io/scale-override
) - Original replica count preservation (stored in annotations)
- Manual sync (on-demand)
- Scheduled sync (cron-based)
- Continuous sync (real-time monitoring)
- Type-based filtering (include/exclude specific resource types)
- Label-based exclusion (
dr-syncer.io/ignore
label) - Namespace mapping between source and destination clusters
- Service and Ingress handling with network configuration adaptation
- Cross-cluster PVC data synchronization using rsync
- Secure SSH-based transfer mechanism
- Storage class mapping for different cluster environments
- Access mode mapping for different storage requirements
- Volume size handling and attribute preservation
- Multi-cluster support with secure authentication
- Comprehensive status reporting and metrics
- Health monitoring and readiness checks
- Leader election for high availability
- Manager: Handles controller lifecycle and shared dependencies
- Reconcilers: Implement controller business logic for custom resources
- Clients: Interact with Kubernetes API in source and remote clusters
- Custom Resources: Define configuration and state for synchronization
- Agent DaemonSet: Deployed on remote clusters with SSH/rsync capability
- SSH Security Model: Secure key management with proper access restrictions
- Direct Access Pattern: Agent pod accesses PVCs directly without root access
- Data Flow: Secure rsync over SSH between controller and agent
# Add the DR-Syncer Helm repository
helm repo add dr-syncer https://supporttools.github.io/dr-syncer/charts
# Update repositories
helm repo update
# Install DR-Syncer
helm install dr-syncer dr-syncer/dr-syncer \
--namespace dr-syncer-system \
--create-namespace
# Define a remote cluster
apiVersion: dr-syncer.io/v1alpha1
kind: RemoteCluster
metadata:
name: dr-cluster
spec:
kubeconfigSecret: dr-cluster-kubeconfig
---
# Define a replication
apiVersion: dr-syncer.io/v1alpha1
kind: NamespaceMapping
metadata:
name: production-to-dr
spec:
sourceNamespace: production
destinationNamespace: production-dr
destinationCluster: dr-cluster
resourceTypes:
- ConfigMap
- Secret
- Deployment
- Service
schedule: "0 */6 * * *" # Every 6 hours
For more comprehensive documentation, visit our documentation site.
The DR-Syncer CLI provides a simple way to perform disaster recovery operations directly from the command line without needing to deploy the controller.
Build the CLI binary:
make build
This will create the dr-syncer-cli
binary in the bin/
directory.
The CLI supports three primary operation modes:
bin/dr-syncer-cli \
--source-kubeconfig=/path/to/source/kubeconfig \
--dest-kubeconfig=/path/to/destination/kubeconfig \
--source-namespace=my-namespace \
--dest-namespace=my-namespace-dr \
--mode=Stage
Stage mode synchronizes resources and scales down deployments to 0 replicas in the destination.
bin/dr-syncer-cli \
--source-kubeconfig=/path/to/source/kubeconfig \
--dest-kubeconfig=/path/to/destination/kubeconfig \
--source-namespace=my-namespace \
--dest-namespace=my-namespace-dr \
--mode=Cutover
Cutover mode synchronizes resources, scales down source deployments, and scales up destination deployments.
bin/dr-syncer-cli \
--source-kubeconfig=/path/to/source/kubeconfig \
--dest-kubeconfig=/path/to/destination/kubeconfig \
--source-namespace=my-namespace \
--dest-namespace=my-namespace-dr \
--mode=Failback
Failback mode scales down destination deployments and scales up source deployments.
The CLI supports many additional options for resource filtering, PVC data migration, and more:
bin/dr-syncer-cli \
--source-kubeconfig=/path/to/source/kubeconfig \
--dest-kubeconfig=/path/to/destination/kubeconfig \
--source-namespace=my-namespace \
--dest-namespace=my-namespace-dr \
--mode=Stage \
--include-custom-resources=true \
--migrate-pvc-data=true \
--resource-types=configmaps,secrets,deployments
For complete CLI documentation, refer to the CLI Usage Guide.
To run the controller locally for development:
- Use the provided
run-local.sh
script:
./run-local.sh /path/to/controller/kubeconfig
The script handles:
- Scaling down the in-cluster controller deployment to prevent conflicts
- Setting the KUBECONFIG environment variable
- Restoring the original deployment when you exit
Build the container:
make docker-build
Deploy the controller:
make deploy
- The controller implements exponential backoff with jitter to prevent API server flooding during failure conditions
- Concurrent operations are managed via a configurable worker pool
- Thread-safe mechanisms are used to handle status updates and prevent race conditions
The project uses GitHub Actions for CI/CD and follows semantic versioning for releases.
To create a new release:
-
Ensure all changes are committed and pushed to the main branch:
git checkout main git pull
-
Create an annotated tag following semantic versioning:
git tag -a v1.0.0 -m "Release v1.0.0"
-
Push the tag to GitHub:
git push origin v1.0.0
When a tag is pushed, the GitHub Actions pipeline automatically:
- Builds all Docker images (controller, agent, rsync) with the version tag
- Pushes images to DockerHub with both version tag and 'latest' tag
- Packages the Helm chart with the release version
- Updates the Helm repository with the new chart
- Creates a GitHub Release with generated release notes
DR-Syncer follows Semantic Versioning:
- Major version (
X
in vX.Y.Z): Incompatible API changes or major functionality changes - Minor version (
Y
in vX.Y.Z): New features that are backward-compatible - Patch version (
Z
in vX.Y.Z): Bug fixes and minor improvements that are backward-compatible
For pre-release versions, use formats like:
v1.0.0-alpha.1
v1.0.0-beta.2
v1.0.0-rc.1
If the remote cluster API calls seem excessive, check the logs for rapid cycling between states. The controller implements exponential backoff to reduce API load during failures.
When running locally, ensure you're using the proper kubeconfig with the necessary permissions:
./run-local.sh /path/to/controller/kubeconfig
Check the status of the Replication resource for error messages:
kubectl get replications -n <namespace> -o yaml
Verify the agent DaemonSet is running on remote clusters:
kubectl get daemonset -n dr-syncer-system --context=<remote-context>
Check agent logs for rsync or SSH errors:
kubectl logs -n dr-syncer-system -l app=dr-syncer-agent --context=<remote-context> --tail=100
If experiencing SSH authentication problems, check the secrets containing SSH keys:
kubectl get secret <remote-cluster>-ssh-key -n dr-syncer-system -o yaml
Always use --tail
flag to limit log output:
kubectl logs pod-name --tail=100 # Last 100 lines
Never use -f/--follow
in automated scripts or tests as these commands will never return and cause scripts to hang.
We welcome contributions from the community! Here's how you can get involved:
See CONTRIBUTING.md for details on contributing to DR-Syncer, including:
- Code contribution guidelines
- Pull request process
- Development environment setup
- Testing requirements
- GitHub Issues: Bug reports, feature requests, and discussions
- Slack Channel: #dr-syncer on Kubernetes community Slack
This project is licensed under the terms of the LICENSE file.