Skip to content

SyncF is a distributed, resilient file synchronization system built in Go that provides reliable file transfer and synchronization capabilities across different storage backends. It features a master-slave architecture with job queuing, progress tracking, and automatic retry mechanisms.

License

Notifications You must be signed in to change notification settings

infrautils/syncf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

5 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

SyncF - Distributed File Synchronization System

SyncF is a distributed, resilient file synchronization system built in Go that provides reliable file transfer and synchronization capabilities across different storage backends. It features a master-slave architecture with job queuing, progress tracking, and automatic retry mechanisms.

πŸš€ Features

  • Distributed Architecture: Master-slave pattern for scalable file synchronization
  • Fault Tolerance: Resumable transfers with byte-level checkpointing
  • Progress Tracking: Real-time monitoring of sync operations and individual file transfers
  • Flexible Storage: Support for multiple storage backends (currently local filesystem)
  • Job Management: Create, pause, resume, and kill synchronization jobs
  • Message Queuing: RabbitMQ-based task distribution to slave nodes
  • Database Persistence: SQLite/PostgreSQL support for job and task persistence
  • RESTful API: HTTP API for job management and monitoring
  • Checksum Verification: SHA-256 checksums ensure data integrity

πŸ“‹ Prerequisites

  • Go 1.24.3 or later
  • RabbitMQ server
  • SQLite or PostgreSQL database

πŸ› οΈ Installation

  1. Clone the repository:

    git clone https://github.com/infrautils/syncf.git
    cd syncf
  2. Install dependencies:

    make setup
  3. Build the application:

    make build
  4. Start RabbitMQ (using Docker):

    make start-rabbitmq

πŸƒ Quick Start

Running the Master Node

The master node provides the HTTP API and manages job scheduling:

make run-master

The master will start on port 8080 and provide the following endpoints:

Running Slave Nodes

Slave nodes process the actual file synchronization tasks:

# Run with auto-generated ID
make run-slave

# Run with custom slave ID
./bin/syncf slave my-slave-001

πŸ“– API Reference

Job Management

Create a Synchronization Job

POST /api/jobs
Content-Type: application/json

{
  "source_path": "/path/to/source",
  "sink_path": "/path/to/destination",
  "source_type": "local",
  "sink_type": "local",
  "interval": "5m"
}

List All Jobs

GET /api/jobs

Get Job Progress and Tracking

GET /api/jobs/{jobId}/tracking

Pause a Job

PUT /api/jobs/{jobId}/pause

Resume a Job

PUT /api/jobs/{jobId}/resume

Kill a Job

DELETE /api/jobs/{jobId}

Health Check

GET /health

Response Examples

Job Creation Response:

{
  "job_id": "123e4567-e89b-12d3-a456-426614174000",
  "status": "running",
  "message": "Job submitted successfully"
}

Job Tracking Response:

{
  "job": {
    "id": "123e4567-e89b-12d3-a456-426614174000",
    "source_path": "/data/source",
    "sink_path": "/data/backup",
    "source_type": "local",
    "sink_type": "local",
    "status": "running",
    "interval": "5m0s",
    "created_at": "2025-09-11T10:00:00Z",
    "updated_at": "2025-09-11T10:05:00Z"
  },
  "tasks": [
    {
      "id": "task-123",
      "source_file": "/data/source/file1.txt",
      "sink_file": "/data/backup/file1.txt",
      "status": "completed",
      "progress": 1.0,
      "file_size": 1048576,
      "copied_bytes": 1048576,
      "checksum": "abc123...",
      "slave_id": "slave-001"
    }
  ]
}

πŸ—οΈ Architecture

System Overview


β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚     Master      β”‚    β”‚   Message Queue β”‚    β”‚     Slave        β”‚
β”‚                 β”‚    β”‚     (RabitMQ)   β”‚    β”‚                  β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚    β”‚                 β”‚    β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚   REST API  β”‚ β”‚    β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚    β”‚ β”‚Task Processorβ”‚ β”‚
β”‚ β”‚             β”‚ β”‚    β”‚  β”‚   Tasks   β”‚  β”‚    β”‚ β”‚              β”‚ β”‚
β”‚ β”‚ Job Manager β”‚ │◄──►│  β”‚   Queue   β”‚  │◄──►│ β”‚File Transfer β”‚ β”‚
β”‚ β”‚             β”‚ β”‚    β”‚  β”‚           β”‚  β”‚    β”‚ β”‚              β”‚ β”‚
β”‚ β”‚ Scheduler   β”‚ β”‚    β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚    β”‚ β”‚Checkpointing β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚    β”‚                 β”‚    β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚                                              β”‚
         |                                              |
         |                                              |
         |______________________________________________|
                    |                       |
                    |                       |
                    β–Ό                       β–Ό
                β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                β”‚    Database     β”‚   β”‚    Storage      β”‚
                β”‚    (SQLite)     β”‚   β”‚   Backends      β”‚
                β”‚                 β”‚   β”‚                 β”‚
                β”‚ β€’ Jobs          β”‚   β”‚ β€’ Local FS      β”‚
                β”‚ β€’ Tasks         β”‚   β”‚ β€’ AWS S3        β”‚
                β”‚ β€’ Checkpoints   β”‚   β”‚ β€’ Google Cloud  β”‚
                β”‚ β€’ Progress      β”‚   β”‚ β€’ Azure Blob    β”‚
                β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Components

  1. Master Node:

    • HTTP API server
    • Job scheduler and manager
    • Task creation and distribution
    • Database operations
  2. Slave Nodes:

    • Task consumers from RabbitMQ
    • File transfer execution
    • Progress reporting and checkpointing
    • Fault recovery
  3. Message Queue (RabbitMQ):

    • Task distribution between master and slaves
    • Ensures reliable task delivery
  4. Database (SQLite/PostgreSQL):

    • Job and task persistence
    • Progress and state tracking
    • Recovery information

Data Flow

  1. User submits sync job via REST API
  2. Master creates job record in database
  3. Master periodically scans source directory
  4. Master creates tasks for new/changed files
  5. Tasks are queued in RabbitMQ
  6. Slave nodes consume and process tasks
  7. Slaves report progress and completion
  8. Master tracks overall job status

Security Model

πŸ”’ Security Note: For security reasons, the source and sink paths provided via the API are treated as relative paths and are appended to a secure base path:

Actual Source Path = BASE_PATH + "/" + API_SOURCE_PATH
Actual Sink Path   = BASE_PATH + "/" + API_SINK_PATH

Example:

  • Base Path: /home/praneethshetty/wlthy/data
  • API Request: "source_path": "documents/project1"
  • Actual Source: /home/praneethshetty/wlthy/data/documents/project1

This prevents path traversal attacks and ensures all operations are contained within the configured base directory.

Fault Tolerance

  • Resumable Transfers: Files are transferred in chunks with periodic checkpoints
  • Byte-level Resume: Transfers can resume from exact byte offset after interruption
  • Checksum Verification: SHA-256 checksums ensure data integrity
  • Automatic Retries: Failed tasks are automatically retried
  • Graceful Recovery: System state is preserved across restarts

βš™οΈ Configuration

The system uses a simple configuration structure in internal/config/config.go:

type Config struct {
    DBConfig db.DBConfig
    MQConfig string
}

Default Configuration:

  • Database: SQLite (bin/syncf.db)
  • Message Queue: amqp://guest:guest@localhost:5672/
  • Storage: Local filesystem (/home/praneethshetty/wlthy/data)

πŸ“Š Job States

Job States

  • running: Job is actively monitoring and syncing files
  • paused: Job is temporarily stopped
  • killed: Job has been terminated

Task States

  • pending: Task is queued for processing
  • in_progress: Task is being processed by a slave
  • completed: Task finished successfully
  • failed: Task failed and may be retried
  • retrying: Task is being retried after failure

πŸ”§ Development

Project Structure

syncf/
β”œβ”€β”€ cmd/
β”‚   └── main.go              # Application entry point
β”œβ”€β”€ internal/
β”‚   └── config/              # Configuration management
β”œβ”€β”€ pkg/
β”‚   β”œβ”€β”€ api/                 # API handlers (placeholder)
β”‚   β”œβ”€β”€ db/                  # Database interfaces and implementations
β”‚   β”œβ”€β”€ job/                 # Job and task models
β”‚   β”œβ”€β”€ master/              # Master node implementation
β”‚   β”œβ”€β”€ slave/               # Slave node implementation
β”‚   β”œβ”€β”€ mq/                  # Message queue integration
β”‚   β”œβ”€β”€ storage/             # Storage backend abstractions
β”‚   └── util/                # Utility functions
β”œβ”€β”€ bin/                     # Built binaries and database
β”œβ”€β”€ go.mod                   # Go module definition
└── Makefile                 # Build and run commands

Building from Source

# Install dependencies
go mod download

# Run tests (if available)
go test ./...

# Build for current platform
go build -o bin/syncf cmd/main.go

# Cross-compile for Linux
GOOS=linux GOARCH=amd64 go build -o bin/syncf-linux cmd/main.go

Adding Storage Backends

To add new storage backends (S3, GCS, Azure, etc.), implement the Storage interface in pkg/storage/storage.go:

type Storage interface {
    ListFiles(ctx context.Context, path string) ([]FileInfo, error)
    ReadFile(ctx context.Context, path string) (io.ReadCloser, error)
    WriteFile(ctx context.Context, path string, content io.Reader, size int64) error
    WriteFileWithOffset(ctx context.Context, path string, content io.Reader, size int64, offset int64) error
    FileExists(ctx context.Context, path string) (bool, error)
    GetFileInfo(ctx context.Context, path string) (*FileInfo, error)
    DeleteFile(ctx context.Context, path string) error
    GetStorageType() string
    Close() error
}

πŸ› Troubleshooting

Common Issues

  1. RabbitMQ Connection Failed

    Failed to connect to RabbitMQ: dial tcp [::1]:5672: connect: connection refused
    
    • Ensure RabbitMQ is running: make start-rabbitmq
    • Check RabbitMQ status: docker ps | grep rabbitmq
  2. Database Connection Issues

    failed to connect to database
    
    • Verify database file permissions
    • Check if bin/ directory exists and is writable
  3. File Permission Errors

    failed to open source file: permission denied
    
    • Ensure slave processes have read access to source paths
    • Verify write permissions for sink paths

Monitoring

  • RabbitMQ Management UI: http://localhost:15672 (guest/guest)
  • Application Logs: Check console output for detailed operation logs
  • API Health: curl http://localhost:8080/health

πŸ“„ License

This project is licensed under the MIT License. See LICENSE file for details.

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

πŸ“ž Support

For questions, issues, or contributions, please open an issue on GitHub or contact the maintainers.


Note: This is a development version. For production use, consider implementing additional security measures, monitoring, and configuration management.

About

SyncF is a distributed, resilient file synchronization system built in Go that provides reliable file transfer and synchronization capabilities across different storage backends. It features a master-slave architecture with job queuing, progress tracking, and automatic retry mechanisms.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published