SyncF is a distributed, resilient file synchronization system built in Go that provides reliable file transfer and synchronization capabilities across different storage backends. It features a master-slave architecture with job queuing, progress tracking, and automatic retry mechanisms.
- Distributed Architecture: Master-slave pattern for scalable file synchronization
- Fault Tolerance: Resumable transfers with byte-level checkpointing
- Progress Tracking: Real-time monitoring of sync operations and individual file transfers
- Flexible Storage: Support for multiple storage backends (currently local filesystem)
- Job Management: Create, pause, resume, and kill synchronization jobs
- Message Queuing: RabbitMQ-based task distribution to slave nodes
- Database Persistence: SQLite/PostgreSQL support for job and task persistence
- RESTful API: HTTP API for job management and monitoring
- Checksum Verification: SHA-256 checksums ensure data integrity
- Go 1.24.3 or later
- RabbitMQ server
- SQLite or PostgreSQL database
-
Clone the repository:
git clone https://github.com/infrautils/syncf.git cd syncf -
Install dependencies:
make setup
-
Build the application:
make build
-
Start RabbitMQ (using Docker):
make start-rabbitmq
The master node provides the HTTP API and manages job scheduling:
make run-masterThe master will start on port 8080 and provide the following endpoints:
Slave nodes process the actual file synchronization tasks:
# Run with auto-generated ID
make run-slave
# Run with custom slave ID
./bin/syncf slave my-slave-001POST /api/jobs
Content-Type: application/json
{
"source_path": "/path/to/source",
"sink_path": "/path/to/destination",
"source_type": "local",
"sink_type": "local",
"interval": "5m"
}GET /api/jobsGET /api/jobs/{jobId}/trackingPUT /api/jobs/{jobId}/pausePUT /api/jobs/{jobId}/resumeDELETE /api/jobs/{jobId}GET /healthJob Creation Response:
{
"job_id": "123e4567-e89b-12d3-a456-426614174000",
"status": "running",
"message": "Job submitted successfully"
}Job Tracking Response:
{
"job": {
"id": "123e4567-e89b-12d3-a456-426614174000",
"source_path": "/data/source",
"sink_path": "/data/backup",
"source_type": "local",
"sink_type": "local",
"status": "running",
"interval": "5m0s",
"created_at": "2025-09-11T10:00:00Z",
"updated_at": "2025-09-11T10:05:00Z"
},
"tasks": [
{
"id": "task-123",
"source_file": "/data/source/file1.txt",
"sink_file": "/data/backup/file1.txt",
"status": "completed",
"progress": 1.0,
"file_size": 1048576,
"copied_bytes": 1048576,
"checksum": "abc123...",
"slave_id": "slave-001"
}
]
}
βββββββββββββββββββ βββββββββββββββββββ ββββββββββββββββββββ
β Master β β Message Queue β β Slave β
β β β (RabitMQ) β β β
β βββββββββββββββ β β β β ββββββββββββββββ β
β β REST API β β β βββββββββββββ β β βTask Processorβ β
β β β β β β Tasks β β β β β β
β β Job Manager β βββββΊβ β Queue β βββββΊβ βFile Transfer β β
β β β β β β β β β β β β
β β Scheduler β β β βββββββββββββ β β βCheckpointing β β
β βββββββββββββββ β β β β ββββββββββββββββ β
βββββββββββββββββββ βββββββββββββββββββ ββββββββββββββββββββ
β β
| |
| |
|______________________________________________|
| |
| |
βΌ βΌ
βββββββββββββββββββ βββββββββββββββββββ
β Database β β Storage β
β (SQLite) β β Backends β
β β β β
β β’ Jobs β β β’ Local FS β
β β’ Tasks β β β’ AWS S3 β
β β’ Checkpoints β β β’ Google Cloud β
β β’ Progress β β β’ Azure Blob β
βββββββββββββββββββ βββββββββββββββββββ
-
Master Node:
- HTTP API server
- Job scheduler and manager
- Task creation and distribution
- Database operations
-
Slave Nodes:
- Task consumers from RabbitMQ
- File transfer execution
- Progress reporting and checkpointing
- Fault recovery
-
Message Queue (RabbitMQ):
- Task distribution between master and slaves
- Ensures reliable task delivery
-
Database (SQLite/PostgreSQL):
- Job and task persistence
- Progress and state tracking
- Recovery information
- User submits sync job via REST API
- Master creates job record in database
- Master periodically scans source directory
- Master creates tasks for new/changed files
- Tasks are queued in RabbitMQ
- Slave nodes consume and process tasks
- Slaves report progress and completion
- Master tracks overall job status
π Security Note: For security reasons, the source and sink paths provided via the API are treated as relative paths and are appended to a secure base path:
Actual Source Path = BASE_PATH + "/" + API_SOURCE_PATH Actual Sink Path = BASE_PATH + "/" + API_SINK_PATHExample:
- Base Path:
/home/praneethshetty/wlthy/data- API Request:
"source_path": "documents/project1"- Actual Source:
/home/praneethshetty/wlthy/data/documents/project1This prevents path traversal attacks and ensures all operations are contained within the configured base directory.
- Resumable Transfers: Files are transferred in chunks with periodic checkpoints
- Byte-level Resume: Transfers can resume from exact byte offset after interruption
- Checksum Verification: SHA-256 checksums ensure data integrity
- Automatic Retries: Failed tasks are automatically retried
- Graceful Recovery: System state is preserved across restarts
The system uses a simple configuration structure in internal/config/config.go:
type Config struct {
DBConfig db.DBConfig
MQConfig string
}Default Configuration:
- Database: SQLite (
bin/syncf.db) - Message Queue:
amqp://guest:guest@localhost:5672/ - Storage: Local filesystem (
/home/praneethshetty/wlthy/data)
running: Job is actively monitoring and syncing filespaused: Job is temporarily stoppedkilled: Job has been terminated
pending: Task is queued for processingin_progress: Task is being processed by a slavecompleted: Task finished successfullyfailed: Task failed and may be retriedretrying: Task is being retried after failure
syncf/
βββ cmd/
β βββ main.go # Application entry point
βββ internal/
β βββ config/ # Configuration management
βββ pkg/
β βββ api/ # API handlers (placeholder)
β βββ db/ # Database interfaces and implementations
β βββ job/ # Job and task models
β βββ master/ # Master node implementation
β βββ slave/ # Slave node implementation
β βββ mq/ # Message queue integration
β βββ storage/ # Storage backend abstractions
β βββ util/ # Utility functions
βββ bin/ # Built binaries and database
βββ go.mod # Go module definition
βββ Makefile # Build and run commands
# Install dependencies
go mod download
# Run tests (if available)
go test ./...
# Build for current platform
go build -o bin/syncf cmd/main.go
# Cross-compile for Linux
GOOS=linux GOARCH=amd64 go build -o bin/syncf-linux cmd/main.goTo add new storage backends (S3, GCS, Azure, etc.), implement the Storage interface in pkg/storage/storage.go:
type Storage interface {
ListFiles(ctx context.Context, path string) ([]FileInfo, error)
ReadFile(ctx context.Context, path string) (io.ReadCloser, error)
WriteFile(ctx context.Context, path string, content io.Reader, size int64) error
WriteFileWithOffset(ctx context.Context, path string, content io.Reader, size int64, offset int64) error
FileExists(ctx context.Context, path string) (bool, error)
GetFileInfo(ctx context.Context, path string) (*FileInfo, error)
DeleteFile(ctx context.Context, path string) error
GetStorageType() string
Close() error
}-
RabbitMQ Connection Failed
Failed to connect to RabbitMQ: dial tcp [::1]:5672: connect: connection refused- Ensure RabbitMQ is running:
make start-rabbitmq - Check RabbitMQ status:
docker ps | grep rabbitmq
- Ensure RabbitMQ is running:
-
Database Connection Issues
failed to connect to database- Verify database file permissions
- Check if
bin/directory exists and is writable
-
File Permission Errors
failed to open source file: permission denied- Ensure slave processes have read access to source paths
- Verify write permissions for sink paths
- RabbitMQ Management UI: http://localhost:15672 (guest/guest)
- Application Logs: Check console output for detailed operation logs
- API Health:
curl http://localhost:8080/health
This project is licensed under the MIT License. See LICENSE file for details.
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
For questions, issues, or contributions, please open an issue on GitHub or contact the maintainers.
Note: This is a development version. For production use, consider implementing additional security measures, monitoring, and configuration management.