English | δΈζ
An intelligent cluster load balancing system designed for Dify Plugin Daemon, with no external dependencies like Kubernetes, implementing high-availability cluster functionality entirely built-in.
This project implements a complete cluster load balancing solution within Dify Plugin Daemon, featuring:
- π Zero-Dependency Cluster: No need for K8s, Docker Swarm or other external tools
- π§ Intelligent Load Balancing: Dynamic load distribution based on request response time
- π Automatic Failover: Automatic node failure detection and traffic redistribution
- π Request Type Recognition: Distinguish between long and short requests for optimized resource allocation
- π― Redis Coordination: Using Redis as cluster state coordination center
- πΎ State Persistence: Persistent storage of request statistics and node states
Dify Plugin Distributed Cluster Architecture
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Dify Main Server β
β 192.168.1.10 β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β βββββββββββββββββββ βββββββββββββββββββββββββββββββββββββββββββββ β
β β Dify Server β β Main Plugin Node β β
β β β β (dify-plugin-daemon) β β
β β PLUGIN_DAEMON_ βββββΆβ β β
β β URL=localhost β β βββββββββββββββββββββββββββββββββββββββ β β
β β :5002 β β β Load Balancer β β β
β β β β β (Smart Request Routing) β β β
β βββββββββββββββββββ β βββββββββββββββββββββββββββββββββββββββ β β
β β βββββββββββββββββββββββββββββββββββββββ β β
β β β Plugin Executor β β β
β β β (Local Plugin Processing) β β β
β β βββββββββββββββββββββββββββββββββββββββ β β
β βββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
β Request Forwarding/Load Balancing
βΌ
βββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββ
β β β
βΌ βΌ βΌ
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β Plugin Node 2 β β Plugin Node 3 β β Plugin Node 4 β
β 192.168.1.11 β β 192.168.1.12 β β 192.168.1.13 β
βββββββββββββββββββ€ βββββββββββββββββββ€ βββββββββββββββββββ€
β βββββββββββββββ β β βββββββββββββββ β β βββββββββββββββ β
β β Plugins β β β β Plugins β β β β Plugins β β
β β Executor β β β β Executor β β β β Executor β β
β βββββββββββββββ β β βββββββββββββββ β β βββββββββββββββ β
β β β β β β
β βββββββββββββββ β β βββββββββββββββ β β βββββββββββββββ β
β β Node Status β β β β Node Status β β β β Node Status β β
β β Reporter β β β β Reporter β β β β Reporter β β
β βββββββββββββββ β β βββββββββββββββ β β βββββββββββββββ β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β β β
β β β
βββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββ
β Redis Coordination β
β 192.168.1.100:6379 β
βββββββββββββββββββββββββββ€
β β’ Node Status Mgmt β
β β’ Request Statistics β
β β’ Master Election Vote β
β β’ Long Request Cache β
β β’ Health Check Heartbeatβ
β β’ Cluster Config Sync β
βββββββββββββββββββββββββββ
Request Flow:
Dify Server β Main Plugin Node β Load Balancing Decision β Forward to Optimal Node or Local Processing
This system implements an intelligent load balancing strategy based on request response time:
graph TD
A[Client Request] --> B{Load Balancer}
B --> C[Request Statistics Analysis]
C --> D{Request Type?}
D -->|Short Request < 5000ms| E[Route to Node 1]
D -->|Long Request β₯ 5000ms| F{Node Count?}
F -->|2 Nodes| G[Two-Node Strategy]
F -->|3+ Nodes| H[Multi-Node Strategy]
G --> I{Node 2 Status?}
I -->|Idle| J[Route to Node 2]
I -->|Busy with Long Req| K[Continue Node 2]
I -->|Busy & Node 1 Idle| L[Route to Node 1]
H --> M[Round-Robin on Nodes 2-N]
E --> N[Execute on Node 1]
J --> O[Execute on Node 2]
K --> O
L --> N
M --> P[Execute on Selected Node]
N --> Q[Update Request Stats]
O --> Q
P --> Q
Q --> R[Update Node Status]
R --> S[Store in Redis]
S --> T[Response to Client]
subgraph "Redis Coordination Center"
S1[Request Statistics]
S2[Node Status]
S3[Long Request Cache]
S4[Master Election]
end
S --> S1
S --> S2
S --> S3
subgraph "Cluster Management"
CM1[Health Check]
CM2[Master Election]
CM3[Node Discovery]
CM4[Garbage Collection]
end
CM2 --> S4
βββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β New Request β βββΆ β Statistics β βββΆ β Request Type β
β Arrives β β Analysis Module β β Classification β
βββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β β
βΌ βΌ
βββββββββββββββββββ βββββββββββββββββββ
β Recent 5 Avg β β Short < 5000ms β
β Time > 5000ms? β β Long β₯ 5000ms β
βββββββββββββββββββ βββββββββββββββββββ
Single-Node Scenario:
All Requests βββΆ Main Plugin Node (Local processing)
β
ββ Short Requests: Direct local execution
ββ Long Requests: Direct local execution
Two-Node Scenario:
Short Requests βββΆ Main Plugin Node (Priority local processing)
β
βΌ
Long Requests βββΆ Smart Load Balancing Decision
ββ Remote Node Idle βββΆ Forward to Remote Node
ββ Remote Node Busy with Long Req βββΆ Continue to Remote Node
ββ Remote Node Busy & Main Node Idle βββΆ Main Node local processing
Multi-Node Scenario:
Short Requests βββΆ Main Plugin Node (Dedicated local processing for short requests)
Long Requests βββΆ Round-robin distribution to other nodes
ββ Plugin Node 2
ββ Plugin Node 3
ββ Plugin Node N
Request Processing Flow:
1. Dify Server sends request to Main Plugin Node
2. RedirectPluginInvoke middleware intercepts request
3. Get available nodes for plugin: FetchPluginAvailableNodesById()
4. Set urlPath to context: ctx.Set("urlPath", ctx.Request.URL.Path)
5. Load balancer selects node: LoadBalancer.SelectNode(ctx, nodes)
6. Determine request type: IsLongRequest(ctx, urlPath)
7. Execute request:
ββ Local execution: handleLocalRequest()
ββ Remote forwarding: handleRemoteRequestWithForwardHeader()
Node Status Management:
Request start: UpdateNodeStatus(nodeId, true, isLong) // Mark as busy
Request end: UpdateNodeStatus(nodeId, false, isLong) // Mark as idle
Status information stored in Redis:
- is_working: Whether node is processing requests
- is_long_request: Whether current processing is long request
- last_update: Status last update time
Request Statistics Update:
After each request completion:
1. Calculate execution time: duration = time.Since(startTime)
2. Update request stats: UpdateRequestStats(ctx, urlPath, duration)
3. Maintain average time of recent 5 requests
4. If average time > 5000ms, mark as long request
Anti-Loop Forwarding Mechanism:
Set Header when forwarding: X-Plugin-Forwarded: true
Receiver checks this Header to prevent infinite forwarding loops
Overall Design Philosophy: This load balancing system's core idea is intelligent distribution based on request response time, optimizing overall cluster performance by distinguishing between "long requests" and "short requests". The design principle is: short requests are frequent and latency-sensitive, should be processed locally; long requests are time-consuming but less sensitive to network latency, can be distributed to other nodes.
Request Type Recognition Mechanism: The system continuously monitors the response time of each API endpoint, maintaining a sliding window of the most recent 5 requests. When an endpoint's average response time exceeds 5000 milliseconds, it gets marked as a "long request endpoint" and cached in Redis. This dynamic learning mechanism allows the system to self-adapt to different plugin performance characteristics without manual configuration.
Layered Load Balancing Strategy: The system adopts different load balancing strategies based on cluster scale. For single node, all requests are processed locally without load balancing. For two nodes, it uses "master-slave collaboration" mode: Main Node specializes in short requests, long requests are intelligently selected based on node status. For multiple nodes, it uses "short request localization, long request distribution" strategy: Main Node handles all short requests, long requests are round-robin distributed among other nodes.
Intelligent Node Selection Logic: In two-node scenarios, the system monitors both nodes' working status in real-time. For long requests, if Remote node is idle, it forwards directly; if Remote node is processing long requests and Main node is idle, it flexibly schedules to Main node; if both nodes are busy, it prioritizes letting the dedicated Remote node continue handling long requests to avoid affecting Main node's short request processing capability.
Status Awareness and Real-time Scheduling: When each request starts, the system marks the corresponding node as "working" status and records whether it's a long request. After request completion, it immediately updates to "idle" status. This real-time status management ensures the load balancer always makes decisions based on the latest node status, avoiding assigning requests to already overloaded nodes.
Performance Statistics and Self-Learning: After each request completion, the system records execution time and updates corresponding endpoint statistics. By maintaining sliding window averages, the system can dynamically adjust "long/short request" judgments for different endpoints, forming a self-learning load balancing system. Meanwhile, these statistical data also provide rich performance metrics for operational monitoring.
Anti-Loop and Fault Tolerance Mechanisms: To avoid infinite request forwarding between nodes, the system adds special header identifiers when forwarding. Receiving nodes check this header and process directly without secondary forwarding. When exceptions occur during load balancing, the system gracefully degrades to round-robin strategy, ensuring service availability.
Architecture Advantages Summary: The biggest advantage of this design is local processing of short requests, distributed processing of long requests. Short requests processed locally on Main Node avoid network latency, ensuring response speed for high-frequency operations; long requests distributed to other nodes avoid blocking Main Node, ensuring overall system throughput. Meanwhile, intelligent scheduling based on real-time status and dynamic learning mechanisms allow the system to self-adapt to different load patterns, achieving truly intelligent load balancing.
- Request Time Statistics: Automatically collect response time for each API endpoint
- Dynamic Threshold Adjustment: Dynamically identify long/short requests based on historical data
- Node State Awareness: Real-time awareness of node working status to avoid overload
- Node Auto-Discovery: New nodes are automatically discovered and registered
- Health Checks: Regular health checks, automatically remove failed nodes
- Master Election: Automatic master node election for cluster coordination and garbage collection
- Automatic Failure Detection: Detect node failures through heartbeat mechanism
- Automatic Traffic Transfer: Traffic from failed nodes automatically transferred to healthy nodes
- Graceful Degradation: Automatically degrade to round-robin strategy in extreme cases
- Redis Caching: Use Redis to cache request statistics and node states
- Atomic Operations: Use atomic operations to ensure concurrency safety
- Batch Updates: Batch update statistical data to reduce Redis access frequency
- Go 1.19+
- Redis 6.0+
- Linux/macOS systems
-
Prepare Servers:
- Dify Main Server: 192.168.1.10 (Running Dify + Main Plugin Node)
- Plugin Node 2: 192.168.1.11
- Plugin Node 3: 192.168.1.12
- Plugin Node 4: 192.168.1.13
-
Configure Dify Main Server:
# Configure on Dify main server
export PLUGIN_DAEMON_URL=http://localhost:5002 # Point to local Main Plugin Node
export REDIS_HOST=192.168.1.100
export REDIS_PORT=6379
export REDIS_PASSWORD=your-password
# Start Main Plugin Node (Load balancing entry point)
./dify-plugin-daemon --cluster-mode=true
- Configure Other Plugin Nodes (All nodes use the same Redis instance):
# Configure the same Redis connection on each Plugin server
export REDIS_HOST=192.168.1.100 # Redis server address
export REDIS_PORT=6379
export REDIS_PASSWORD=your-password
- Start Plugin Node 2:
./dify-plugin-daemon --cluster-mode=true
- Start Plugin Node 3:
./dify-plugin-daemon --cluster-mode=true
- Start Plugin Node 4:
./dify-plugin-daemon --cluster-mode=true
# View cluster status through Main Plugin Node
# View cluster nodes
curl http://192.168.1.10:5002/cluster/nodes
# View load balancing statistics
curl http://192.168.1.10:5002/cluster/stats
# View current master node (Master can be any node, elected by voting)
curl http://192.168.1.10:5002/cluster/master
# Test load balancing - All requests go through Main Plugin Node
curl -X POST http://192.168.1.10:5002/plugins/invoke \
-H "Content-Type: application/json" \
-d '{"plugin_id": "test", "method": "run"}'
# You can also directly access other nodes for status (debugging only)
curl http://192.168.1.11:5002/cluster/nodes
curl http://192.168.1.12:5002/cluster/nodes
curl http://192.168.1.13:5002/cluster/nodes
- Total Requests: Total number of requests for each endpoint
- Average Response Time: Average time of recent 5 requests
- Maximum Response Time: Historical maximum response time
- Long Request Identification: Whether identified as long request
- Node Online Status: Whether node is online
- Working Status: Whether node is processing requests
- Request Type: Current request type being processed (long/short)
- Last Update Time: Status last update time
const (
// Long request threshold (milliseconds)
LongRequestThreshold = 5000
// Statistics window size (recent N requests)
StatisticsWindowSize = 5
// Node health check interval
NodeHealthCheckInterval = 5 * time.Second
// Master election interval
MasterElectionInterval = 500 * time.Millisecond
)
const (
RequestStatsKey = "request:stats" // Request statistics
LongRequestsKey = "request:long_requests" // Long request set
NodeStatusKey = "node:status" // Node status
ClusterStatusKey = "cluster:status" // Cluster status
)
Performance test results in two-node configuration:
Scenario | Traditional Round-Robin | Intelligent Load Balancing | Performance Improvement |
---|---|---|---|
Mixed Load | 3.2s | 1.8s | 43.75% |
Short Request Dominant | 0.5s | 0.3s | 40% |
Long Request Dominant | 8.1s | 5.2s | 35.8% |
Node Count | Concurrent Requests | Avg Response Time | Success Rate |
---|---|---|---|
2 | 1000 | 1.2s | 99.8% |
3 | 2000 | 1.1s | 99.9% |
5 | 5000 | 1.0s | 99.9% |
// Custom load balancing strategy
type CustomLoadBalancer struct {
*LoadBalancer
}
func (clb *CustomLoadBalancer) SelectNode(ctx *gin.Context, nodes []string) string {
// Implement custom node selection logic
return clb.LoadBalancer.SelectNode(ctx, nodes)
}
// Add custom statistics metrics
func (lb *LoadBalancer) UpdateCustomStats(metric string, value interface{}) error {
return cache.SetMapOneField("custom:stats", metric, value)
}
-
Node Cannot Join Cluster
- Check Redis connection configuration
- Confirm network connectivity
- Review node logs
-
Load Balancing Not Working
- Confirm request statistics are being collected normally
- Check long request threshold configuration
- Verify node status updates
-
Master Election Failed
- Check Redis locking mechanism
- Confirm node clock synchronization
- Review election logs
# View cluster data in Redis
redis-cli HGETALL "cluster:status"
redis-cli HGETALL "request:stats"
redis-cli HGETALL "node:status"
# Enable debug logging
export CLUSTER_DEBUG=true
./dify-plugin-daemon --cluster-mode=true
Issues and Pull Requests are welcome!
# Clone project
git clone https://github.com/xiaomeixw/dify-plugin-cluster
# Install dependencies
go mod tidy
# Run tests
go test ./internal/cluster/...
# Start development environment
make dev-cluster
- Feature development:
feat: add new load balancing strategy
- Bug fixes:
fix: resolve node election race condition
- Documentation updates:
docs: update cluster configuration guide
This project is licensed under the Apache-2.0 License - see the LICENSE file for details.
β If this project helps you, please give it a Star!