🐸 ClusterF 🐸

The F stands for frog

A self-organizing peer-to-peer distributed file storage cluster with CRDT-based replication.

Features

Zero-Configuration P2P Architecture: Nodes automatically discover each other via UDP broadcast and form a cluster
CRDT-Based Replication: Conflict-free replicated data types ensure eventual consistency without coordination
Configurable Replication Factor: At any time during operations, set the replication factor from 1 - single copy up to full mirroring on every node.
Partition-Based Storage: Files are distributed across partitions with automatic balancing
HTTP/REST API: Complete programmatic access to cluster operations
Web UI: Built-in monitoring dashboard, file browser, and cluster visualizer
WebDAV Server: Mount cluster storage as a network drive
Full-Text Search: Built-in indexer for finding files by name and metadata
Media Transcoding: Automatic ffmpeg-based transcoding for streaming
Local Import/Export: Synchronize between cluster storage and local filesystems
Simulation Mode: Test cluster behavior with multiple nodes in one process
Profiling Support: Built-in pprof and flamegraph generation

Installation

go install github.com/donomii/clusterF@latest

Or build from source:

git clone https://github.com/donomii/clusterF
cd clusterF
go build

Quick Start

Start a single node:

./clusterF

The node will:

Automatically generate a node ID
Create a data directory (./data/<node-id>)
Start HTTP API on a random port (typically 30000-60000)
Begin broadcasting for peer discovery on UDP port 9999
Open a web dashboard

Access the dashboard at http://localhost:<port>/monitor (port shown in startup output).

Usage Examples

Basic Operations

Start a node with specific configuration:

./clusterF --node-id mynode --data-dir /var/clusterF --http-port 8080

Upload a file:

curl -X PUT --data-binary @photo.jpg http://localhost:8080/api/files/photos/photo.jpg

Download a file:

curl http://localhost:8080/api/files/photos/photo.jpg -o photo.jpg

List files:

curl http://localhost:8080/api/files/photos/

Search for files:

curl "http://localhost:8080/api/search?q=vacation"

Advanced Features

WebDAV Server

Serve cluster files over WebDAV:

./clusterF --webdav /photos

Mount on macOS:

open "http://localhost:8080"

Import/Export

Mirror cluster files to a local directory:

./clusterF --export-dir /mnt/share --cluster-dir /photos

Import files from local directory to cluster:

./clusterF --import-dir /home/user/photos --cluster-dir /backup

Client Mode

Join cluster without storing data locally:

./clusterF --no-store

Simulation Mode

Test cluster with multiple nodes:

./clusterF --sim-nodes 10 --base-port 30000

Architecture

Components

CRDT Layer (frogpond): Manages distributed state with eventual consistency
Discovery Manager: UDP broadcast-based peer discovery
Partition Manager: Distributes files across partitions with configurable replication
File System: Unified interface for file operations across the cluster
Indexer: Full-text search and metadata indexing
File Sync: Bidirectional synchronization with local filesystems
Thread Manager: Lifecycle management for background subsystems
Metrics Collector: Performance monitoring and statistics

Storage Options

clusterF currently supports file-based disk storage, files are visible and accessible from the command line. Specialised data stores are possible but not integrated yet.

Select backend with --storage-major:

./clusterF --storage-major bolt

Replication

Files are distributed across partitions based on path hash. Each partition is replicated to RF nodes (default RF=3). The system automatically:

Detects under-replicated partitions
Selects replication targets
Synchronizes partition data between nodes
Handles node failures gracefully

Adjust replication factor via API:

curl -X PUT -H "Content-Type: application/json" \
  -d '{"replication_factor": 5}' \
  http://localhost:8080/api/replication-factor

API Reference

File Operations

GET /api/files/<path> - Download file
PUT /api/files/<path> - Upload file
DELETE /api/files/<path> - Delete file
POST /api/files/<path> - Create directory (with X-Create-Directory: true header)
GET /api/metadata/<path> - Get file metadata

Search

GET /api/search?q=<query> - Search files by name/metadata

Cluster Management

GET /status - Node status and statistics
GET /api/cluster-stats - Cluster-wide statistics
GET /api/partition-stats - Partition distribution
GET /api/replication-factor - Get RF
PUT /api/replication-factor - Set RF
GET /api/under-replicated - List under-replicated partitions
POST /api/integrity-check - Verify stored file integrity

Monitoring

GET /monitor - Web-based monitoring dashboard
GET /api/metrics - Prometheus-compatible metrics
GET /cluster-visualizer.html - Network topology visualization

Profiling

GET /profiling - Profiling control panel
GET /flamegraph - CPU flame graph
GET /memorygraph - Memory flame graph
GET /debug/pprof/* - Go pprof endpoints

Configuration

Command-Line Options

--node-id           Node identifier (auto-generated if not specified)
--data-dir          Base data directory (default: ./data)
--http-port         HTTP API port (0 = auto)
--discovery-port    UDP discovery port (default: 9999)
--webdav            Serve cluster path over WebDAV
--export-dir        Mirror cluster files to local directory
--import-dir        Import files from local directory
--cluster-dir       Cluster path prefix for import/export
--exclude-dirs      Comma-separated directories to exclude from import
--no-store          Client mode: don't store partitions locally
--storage-major     Storage format (extent|bolt|sqlite|rawfile)
--storage-minor     Storage format minor version
--encryption-key    Encryption key for at-rest encryption
--no-desktop        Don't open desktop UI
--debug             Enable verbose debug logging
--profiling         Enable profiling at startup
--version           Print version and exit

Simulation Mode

--sim-nodes         Number of nodes to simulate
--base-port         Base HTTP port for simulation nodes

Web UI

The web interface provides:

Dashboard (/monitor): Real-time cluster metrics, peer status, partition distribution
File Browser (/files/): Navigate and manage cluster files
Visualizer (/cluster-visualizer.html): Interactive network topology
CRDT Inspector (/crdt): Examine distributed state
Metrics (/metrics): Performance graphs and statistics
Profiling (/profiling): CPU and memory profiling tools

Development

Building

go build

Testing

go test ./...

Run large-scale cluster tests:

go test -run TestLargeCluster -v

Project Structure

clusterF/
├── main.go                 # Entry point and cluster lifecycle
├── cluster.go              # Core cluster implementation
├── discovery/              # Peer discovery
├── partitionmanager/       # Partition distribution and replication
├── filesystem/             # File system abstraction
├── filesync/               # Import/export synchronization
├── indexer/                # Search indexing
├── metrics/                # Performance monitoring
├── frontend/               # Web UI
├── webdav/                 # WebDAV server
└── types/                  # Shared types and interfaces

Performance

Nodes handle thousands of concurrent connections
Partitions sync in parallel across multiple nodes

Troubleshooting

Nodes not discovering each other

Verify UDP port 9999 is not blocked by firewall
Check nodes are on same subnet for broadcast discovery
Try explicit discovery port: --discovery-port 9999

Under-replicated partitions

Check /api/under-replicated for report
Verify sufficient nodes are online
Increase partition sync interval: curl -X PUT -d '{"partition_sync_interval_seconds": 30}' http://localhost:8080/api/partition-sync-interval

High memory usage

Reduce partition sync parallelism (currently hardcoded)
Enable profiling: --profiling and check /memorygraph
Consider client mode for some nodes: --no-store

Data directory errors

Ensure write permissions on data directory
Storage format is locked after first start (cannot change --storage-major)
Verify encryption key matches if repository was created with encryption

License

GNU Affero General Public License v3.0 (AGPL-3.0)

See LICENSE file for full text.

Contributing

This project follows strict coding conventions:

Links

Repository: https://github.com/donomii/clusterF
Issues: https://github.com/donomii/clusterF/issues

Name		Name	Last commit message	Last commit date
Latest commit History 550 Commits
.github/workflows		.github/workflows
discovery		discovery
filesync		filesync
filesystem		filesystem
frontend		frontend
httpclient		httpclient
indexer		indexer
metrics		metrics
notes		notes
partitionmanager		partitionmanager
syncmap		syncmap
testenv		testenv
tests		tests
threadmanager		threadmanager
types		types
urlutil		urlutil
webdav		webdav
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build.sh		build.sh
circuit_breaker.go		circuit_breaker.go
circuit_breaker_test.go		circuit_breaker_test.go
cluster.go		cluster.go
cluster_large_test.go		cluster_large_test.go
cluster_test.go		cluster_test.go
crdt_api.go		crdt_api.go
debug_test.go		debug_test.go
desktop_ui.go		desktop_ui.go
desktop_ui_nocgo.go		desktop_ui_nocgo.go
disk_activity.go		disk_activity.go
dockicon_darwin.go		dockicon_darwin.go
dockicon_stub.go		dockicon_stub.go
failfast_test.go		failfast_test.go
files_api.go		files_api.go
files_api_direct_test.go		files_api_direct_test.go
files_api_timezone_test.go		files_api_timezone_test.go
filesystem_test.go		filesystem_test.go
filesystem_url_test.go		filesystem_url_test.go
flamegraph.go		flamegraph.go
frogpond_integration.go		frogpond_integration.go
frontend_adapter.go		frontend_adapter.go
go.mod		go.mod
go.sum		go.sum
http_dynamic_port_test.go		http_dynamic_port_test.go
logerrf.go		logerrf.go
main.go		main.go
nostore_test.go		nostore_test.go
profiling_api.go		profiling_api.go
search.go		search.go
search_test.go		search_test.go
status_handler_test.go		status_handler_test.go
transcode_api.go		transcode_api.go
transcoder.go		transcoder.go
transport_error.go		transport_error.go

License

donomii/clusterF

Folders and files

Latest commit

History

Repository files navigation

🐸 ClusterF 🐸

Features

Installation

Quick Start

Usage Examples

Basic Operations

Advanced Features

WebDAV Server

Import/Export

Client Mode

Simulation Mode

Architecture

Components

Storage Options

Replication

API Reference

File Operations

Search

Cluster Management

Monitoring

Profiling

Configuration

Command-Line Options

Simulation Mode

Web UI

Development

Building

Testing

Project Structure

Performance

Troubleshooting

Nodes not discovering each other

Under-replicated partitions

High memory usage

Data directory errors

License

Contributing

Links

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages