-
Notifications
You must be signed in to change notification settings - Fork 379
Description
The existing docs provide information about how to deploy non-K8s clustering envs, but it's somewhat buried across multiple places/topics. We can maybe improve the existing clustering topics with additional information as discussed (thread captured here):
Alloy fully supports high availability (HA) clustering in non-Kubernetes environments. While much of the documentation focuses on Kubernetes deployments with Helm charts, you can achieve the same clustering capabilities on bare metal servers, VMs, or any other infrastructure by using command-line flags.
Documentation Links:
- Clustering Overview - Main clustering concepts and how it works
- CLI Reference - run command - Complete list of all --cluster.* command-line flags
- Deployment Topologies - Different deployment patterns including centralized services
- Known Issue #1441 - Clustering startup issues on non-Kubernetes platforms (v1.3.0+)
Setting Up Clustering:
Clustering is configured entirely through command-line arguments when starting Alloy. The core flag --cluster.enabled=true activates clustering mode, while additional --cluster.* flags handle node discovery and communication.
Essential Command-Line Flags:
Here are the key flags you'll need for a non-Kubernetes HA setup:
--cluster.enabled=true
- Activates clustering mode--cluster.join-addresses
- Comma-separated list of peer node addresses to join (IP:port format)--cluster.discover-peers
- Alternative peer discovery using provider-based discovery (e.g., cloud provider APIs)--cluster.advertise-address
- The address this node broadcasts to other cluster members--cluster.name
- Optional cluster identifier to prevent accidental merging with other clusters--cluster.wait-for-size
- Minimum number of nodes required before processing begins
How Clustering Works:
Alloy's clustering provides both high availability and horizontal scalability through an eventually consistent model. The system assumes all participating nodes are interchangeable and will converge on the same configuration.
Target Auto-Distribution is the primary benefit - scraping components automatically distribute workload across all cluster peers. When targets need to be scraped, the cluster uses consistent hashing to determine which node handles each target. This means:
- Workload is automatically balanced across nodes
- If a node fails, its targets are redistributed to remaining peers
- Adding new nodes automatically rebalances the load
- Only about 1/N of targets are redistributed when cluster membership changes
Peer Discovery Options:
Static Discovery: Use --cluster.join-addresses with a comma-separated list of peer node addresses (IP:port format). This is the most straightforward approach for fixed infrastructure.
Dynamic Discovery: Use --cluster.discover-peers with provider-based discovery. This option expects tuples in the form provider=XXX key=val key=val ... and uses the go-discover package to automatically find peers based on cloud provider APIs or other discovery mechanisms. For example:
- AWS: provider=aws region=us-west-2 tag_key=alloy-cluster tag_value=production
- Other supported providers include Azure, GCP, and many more
This dynamic approach is particularly useful for cloud deployments where IP addresses may change or for auto-scaling scenarios.
Deployment Considerations:
Unlike Kubernetes where service discovery handles peer communication automatically, you'll need to configure how nodes find each other using one of the peer discovery methods above. Additionally:
- Ensure all nodes can reach each other on the clustering port
- Use consistent configuration files across all cluster members
The clustering functionality itself works identically whether deployed on Kubernetes, bare metal, or cloud VMs - only the peer discovery and initial network configuration differs.