The F stands for frog
A self-organizing peer-to-peer distributed file storage cluster with CRDT-based replication.
- Zero-Configuration P2P Architecture: Nodes automatically discover each other via UDP broadcast and form a cluster
- CRDT-Based Replication: Conflict-free replicated data types ensure eventual consistency without coordination
- Configurable Replication Factor: At any time during operations, set the replication factor from 1 - single copy up to full mirroring on every node.
- Partition-Based Storage: Files are distributed across partitions with automatic balancing
- HTTP/REST API: Complete programmatic access to cluster operations
- Web UI: Built-in monitoring dashboard, file browser, and cluster visualizer
- WebDAV Server: Mount cluster storage as a network drive
- Full-Text Search: Built-in indexer for finding files by name and metadata
- Media Transcoding: Automatic ffmpeg-based transcoding for streaming
- Local Import/Export: Synchronize between cluster storage and local filesystems
- Simulation Mode: Test cluster behavior with multiple nodes in one process
- Profiling Support: Built-in pprof and flamegraph generation
go install github.com/donomii/clusterF@latestOr build from source:
git clone https://github.com/donomii/clusterF
cd clusterF
go buildStart a single node:
./clusterFThe node will:
- Automatically generate a node ID
- Create a data directory (
./data/<node-id>) - Start HTTP API on a random port (typically 30000-60000)
- Begin broadcasting for peer discovery on UDP port 9999
- Open a web dashboard
Access the dashboard at http://localhost:<port>/monitor (port shown in startup output).
Start a node with specific configuration:
./clusterF --node-id mynode --data-dir /var/clusterF --http-port 8080Upload a file:
curl -X PUT --data-binary @photo.jpg http://localhost:8080/api/files/photos/photo.jpgDownload a file:
curl http://localhost:8080/api/files/photos/photo.jpg -o photo.jpgList files:
curl http://localhost:8080/api/files/photos/Search for files:
curl "http://localhost:8080/api/search?q=vacation"Serve cluster files over WebDAV:
./clusterF --webdav /photosMount on macOS:
open "http://localhost:8080"Mirror cluster files to a local directory:
./clusterF --export-dir /mnt/share --cluster-dir /photosImport files from local directory to cluster:
./clusterF --import-dir /home/user/photos --cluster-dir /backupJoin cluster without storing data locally:
./clusterF --no-storeTest cluster with multiple nodes:
./clusterF --sim-nodes 10 --base-port 30000- CRDT Layer (frogpond): Manages distributed state with eventual consistency
- Discovery Manager: UDP broadcast-based peer discovery
- Partition Manager: Distributes files across partitions with configurable replication
- File System: Unified interface for file operations across the cluster
- Indexer: Full-text search and metadata indexing
- File Sync: Bidirectional synchronization with local filesystems
- Thread Manager: Lifecycle management for background subsystems
- Metrics Collector: Performance monitoring and statistics
clusterF currently supports file-based disk storage, files are visible and accessible from the command line. Specialised data stores are possible but not integrated yet.
Select backend with --storage-major:
./clusterF --storage-major boltFiles are distributed across partitions based on path hash. Each partition is replicated to RF nodes (default RF=3). The system automatically:
- Detects under-replicated partitions
- Selects replication targets
- Synchronizes partition data between nodes
- Handles node failures gracefully
Adjust replication factor via API:
curl -X PUT -H "Content-Type: application/json" \
-d '{"replication_factor": 5}' \
http://localhost:8080/api/replication-factorGET /api/files/<path>- Download filePUT /api/files/<path>- Upload fileDELETE /api/files/<path>- Delete filePOST /api/files/<path>- Create directory (withX-Create-Directory: trueheader)GET /api/metadata/<path>- Get file metadata
GET /api/search?q=<query>- Search files by name/metadata
GET /status- Node status and statisticsGET /api/cluster-stats- Cluster-wide statisticsGET /api/partition-stats- Partition distributionGET /api/replication-factor- Get RFPUT /api/replication-factor- Set RFGET /api/under-replicated- List under-replicated partitionsPOST /api/integrity-check- Verify stored file integrity
GET /monitor- Web-based monitoring dashboardGET /api/metrics- Prometheus-compatible metricsGET /cluster-visualizer.html- Network topology visualization
GET /profiling- Profiling control panelGET /flamegraph- CPU flame graphGET /memorygraph- Memory flame graphGET /debug/pprof/*- Go pprof endpoints
--node-id Node identifier (auto-generated if not specified)
--data-dir Base data directory (default: ./data)
--http-port HTTP API port (0 = auto)
--discovery-port UDP discovery port (default: 9999)
--webdav Serve cluster path over WebDAV
--export-dir Mirror cluster files to local directory
--import-dir Import files from local directory
--cluster-dir Cluster path prefix for import/export
--exclude-dirs Comma-separated directories to exclude from import
--no-store Client mode: don't store partitions locally
--storage-major Storage format (extent|bolt|sqlite|rawfile)
--storage-minor Storage format minor version
--encryption-key Encryption key for at-rest encryption
--no-desktop Don't open desktop UI
--debug Enable verbose debug logging
--profiling Enable profiling at startup
--version Print version and exit
--sim-nodes Number of nodes to simulate
--base-port Base HTTP port for simulation nodes
The web interface provides:
- Dashboard (
/monitor): Real-time cluster metrics, peer status, partition distribution - File Browser (
/files/): Navigate and manage cluster files - Visualizer (
/cluster-visualizer.html): Interactive network topology - CRDT Inspector (
/crdt): Examine distributed state - Metrics (
/metrics): Performance graphs and statistics - Profiling (
/profiling): CPU and memory profiling tools
go buildgo test ./...Run large-scale cluster tests:
go test -run TestLargeCluster -vclusterF/
├── main.go # Entry point and cluster lifecycle
├── cluster.go # Core cluster implementation
├── discovery/ # Peer discovery
├── partitionmanager/ # Partition distribution and replication
├── filesystem/ # File system abstraction
├── filesync/ # Import/export synchronization
├── indexer/ # Search indexing
├── metrics/ # Performance monitoring
├── frontend/ # Web UI
├── webdav/ # WebDAV server
└── types/ # Shared types and interfaces
- Nodes handle thousands of concurrent connections
- Partitions sync in parallel across multiple nodes
- Verify UDP port 9999 is not blocked by firewall
- Check nodes are on same subnet for broadcast discovery
- Try explicit discovery port:
--discovery-port 9999
- Check
/api/under-replicatedfor report - Verify sufficient nodes are online
- Increase partition sync interval:
curl -X PUT -d '{"partition_sync_interval_seconds": 30}' http://localhost:8080/api/partition-sync-interval
- Reduce partition sync parallelism (currently hardcoded)
- Enable profiling:
--profilingand check/memorygraph - Consider client mode for some nodes:
--no-store
- Ensure write permissions on data directory
- Storage format is locked after first start (cannot change
--storage-major) - Verify encryption key matches if repository was created with encryption
GNU Affero General Public License v3.0 (AGPL-3.0)
See LICENSE file for full text.
This project follows strict coding conventions:
- Repository: https://github.com/donomii/clusterF
- Issues: https://github.com/donomii/clusterF/issues