Skip to content

Commit 2e1c523

Browse files
committed
docs: documentation for placement groups
1 parent c2a8829 commit 2e1c523

File tree

5 files changed

+658
-1
lines changed

5 files changed

+658
-1
lines changed
Lines changed: 201 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,201 @@
1+
+++
2+
title = "AWS Placement Group Node Fature Discovery"
3+
+++
4+
5+
The AWS placement group NFD (Node Feature Discovery) customization automatically discovers and labels nodes with their placement group information, enabling workload scheduling based on placement group characteristics.
6+
7+
This customization will be available when the
8+
[provider-specific cluster configuration patch]({{< ref "..">}}) is included in the `ClusterClass`.
9+
10+
## What is Placement Group NFD?
11+
12+
Placement Group NFD automatically discovers the placement group information for each node and creates node labels that can be used for workload scheduling. This enables:
13+
14+
- **Workload Affinity**: Schedule pods on nodes within the same placement group for low latency
15+
- **Fault Isolation**: Schedule critical workloads on nodes in different placement groups
16+
- **Resource Optimization**: Use placement group labels for advanced scheduling strategies
17+
18+
## How it Works
19+
20+
The NFD customization:
21+
22+
1. **Deploys a Discovery Script**: Automatically installs a script on each node that queries AWS metadata
23+
2. **Queries AWS Metadata**: Uses EC2 instance metadata to discover placement group information
24+
3. **Creates Node Labels**: Generates Kubernetes node labels with placement group details
25+
4. **Updates Continuously**: Refreshes labels as nodes are added or moved
26+
27+
## Generated Node Labels
28+
29+
The NFD customization creates the following node labels:
30+
31+
| Label | Description | Example |
32+
|-------|-------------|---------|
33+
| `feature.node.kubernetes.io/aws-placement-group` | The name of the placement group | `my-cluster-pg` |
34+
| `feature.node.kubernetes.io/partition` | The partition number (for partition placement groups) | `0`, `1`, `2` |
35+
36+
## Configuration
37+
38+
The placement group NFD customization is automatically enabled when a placement group is configured. No additional configuration is required.
39+
40+
```yaml
41+
apiVersion: cluster.x-k8s.io/v1beta1
42+
kind: Cluster
43+
metadata:
44+
name: <NAME>
45+
spec:
46+
topology:
47+
variables:
48+
- name: clusterConfig
49+
value:
50+
controlPlane:
51+
aws:
52+
placementGroup:
53+
name: "control-plane-pg"
54+
- name: workerConfig
55+
value:
56+
aws:
57+
placementGroup:
58+
name: "worker-pg"
59+
```
60+
61+
## Usage Examples
62+
63+
### Workload Affinity
64+
65+
Schedule pods on nodes within the same placement group for low latency:
66+
67+
```yaml
68+
apiVersion: apps/v1
69+
kind: Deployment
70+
metadata:
71+
name: high-performance-app
72+
spec:
73+
replicas: 3
74+
selector:
75+
matchLabels:
76+
app: high-performance-app
77+
template:
78+
metadata:
79+
labels:
80+
app: high-performance-app
81+
spec:
82+
affinity:
83+
nodeAffinity:
84+
requiredDuringSchedulingIgnoredDuringExecution:
85+
nodeSelectorTerms:
86+
- matchExpressions:
87+
- key: feature.node.kubernetes.io/aws-placement-group
88+
operator: In
89+
values: ["worker-pg"]
90+
containers:
91+
- name: app
92+
image: my-app:latest
93+
```
94+
95+
### Fault Isolation
96+
97+
Distribute critical workloads across different placement groups:
98+
99+
```yaml
100+
apiVersion: apps/v1
101+
kind: Deployment
102+
metadata:
103+
name: critical-app
104+
spec:
105+
replicas: 6
106+
selector:
107+
matchLabels:
108+
app: critical-app
109+
template:
110+
metadata:
111+
labels:
112+
app: critical-app
113+
spec:
114+
affinity:
115+
podAntiAffinity:
116+
requiredDuringSchedulingIgnoredDuringExecution:
117+
- labelSelector:
118+
matchExpressions:
119+
- key: app
120+
operator: In
121+
values: ["critical-app"]
122+
topologyKey: feature.node.kubernetes.io/aws-placement-group
123+
containers:
124+
- name: app
125+
image: critical-app:latest
126+
```
127+
128+
### Partition-Aware Scheduling
129+
130+
For partition placement groups, schedule workloads on specific partitions:
131+
132+
```yaml
133+
apiVersion: apps/v1
134+
kind: StatefulSet
135+
metadata:
136+
name: distributed-database
137+
spec:
138+
replicas: 3
139+
selector:
140+
matchLabels:
141+
app: distributed-database
142+
template:
143+
metadata:
144+
labels:
145+
app: distributed-database
146+
spec:
147+
affinity:
148+
nodeAffinity:
149+
requiredDuringSchedulingIgnoredDuringExecution:
150+
nodeSelectorTerms:
151+
- matchExpressions:
152+
- key: feature.node.kubernetes.io/partition
153+
operator: In
154+
values: ["0", "1", "2"]
155+
containers:
156+
- name: database
157+
image: my-database:latest
158+
```
159+
160+
## Verification
161+
162+
You can verify that the NFD labels are working by checking the node labels:
163+
164+
```bash
165+
# Check all nodes and their placement group labels
166+
kubectl get nodes --show-labels | grep placement-group
167+
168+
# Check specific node labels
169+
kubectl describe node <node-name> | grep placement-group
170+
171+
# Check partition labels
172+
kubectl get nodes --show-labels | grep partition
173+
```
174+
175+
## Troubleshooting
176+
177+
### Check NFD Script Status
178+
179+
Verify that the discovery script is running:
180+
181+
```bash
182+
# Check if the script exists on nodes
183+
kubectl debug node/<node-name> -it --image=busybox -- chroot /host ls -la /etc/kubernetes/node-feature-discovery/source.d/
184+
185+
# Check script execution
186+
kubectl debug node/<node-name> -it --image=busybox -- chroot /host cat /etc/kubernetes/node-feature-discovery/features.d/placementgroup
187+
```
188+
189+
## Integration with Other Features
190+
191+
Placement Group NFD works seamlessly with:
192+
193+
- **Pod Affinity/Anti-Affinity**: Use placement group labels for advanced scheduling
194+
- **Topology Spread Constraints**: Distribute workloads across placement groups
195+
196+
## Security Considerations
197+
198+
- The discovery script queries AWS instance metadata (IMDSv2)
199+
- No additional IAM permissions are required beyond standard node permissions
200+
- Labels are automatically managed and do not require manual intervention
201+
- The script runs with appropriate permissions and security context
Lines changed: 138 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,138 @@
1+
+++
2+
title = "AWS Placement Group"
3+
+++
4+
5+
The AWS placement group customization allows the user to specify placement groups for control-plane
6+
and worker machines to control their placement strategy within AWS.
7+
8+
This customization will be available when the
9+
[provider-specific cluster configuration patch]({{< ref "..">}}) is included in the `ClusterClass`.
10+
11+
## What are Placement Groups?
12+
13+
AWS placement groups are logical groupings of instances within a single Availability Zone that influence how instances are placed on underlying hardware. They are useful for:
14+
15+
- **Cluster Placement Groups**: For applications that benefit from low network latency, high network throughput, or both
16+
- **Partition Placement Groups**: For large distributed and replicated workloads, such as HDFS, HBase, and Cassandra
17+
- **Spread Placement Groups**: For applications that have a small number of critical instances that should be kept separate
18+
19+
## Configuration
20+
21+
The placement group configuration supports the following field:
22+
23+
| Field | Type | Required | Description |
24+
|-------|------|----------|-------------|
25+
| `name` | string | Yes | The name of the placement group (1-255 characters) |
26+
27+
## Examples
28+
29+
### Control Plane and Worker Placement Groups
30+
31+
To specify placement groups for both control plane and worker machines:
32+
33+
```yaml
34+
apiVersion: cluster.x-k8s.io/v1beta1
35+
kind: Cluster
36+
metadata:
37+
name: <NAME>
38+
spec:
39+
topology:
40+
variables:
41+
- name: clusterConfig
42+
value:
43+
controlPlane:
44+
aws:
45+
placementGroup:
46+
name: "control-plane-pg"
47+
- name: workerConfig
48+
value:
49+
aws:
50+
placementGroup:
51+
name: "worker-pg"
52+
```
53+
54+
### Control Plane Only
55+
56+
To specify placement group only for control plane machines:
57+
58+
```yaml
59+
apiVersion: cluster.x-k8s.io/v1beta1
60+
kind: Cluster
61+
metadata:
62+
name: <NAME>
63+
spec:
64+
topology:
65+
variables:
66+
- name: clusterConfig
67+
value:
68+
controlPlane:
69+
aws:
70+
placementGroup:
71+
name: "control-plane-pg"
72+
```
73+
74+
### MachineDeployment Overrides
75+
76+
You can customize individual MachineDeployments by using the overrides field:
77+
78+
```yaml
79+
spec:
80+
topology:
81+
# ...
82+
workers:
83+
machineDeployments:
84+
- class: default-worker
85+
name: md-0
86+
variables:
87+
overrides:
88+
- name: workerConfig
89+
value:
90+
aws:
91+
placementGroup:
92+
name: "special-worker-pg"
93+
```
94+
95+
## Resulting CAPA Configuration
96+
97+
Applying the placement group configuration will result in the following value being set:
98+
99+
- control-plane `AWSMachineTemplate`:
100+
101+
- ```yaml
102+
spec:
103+
template:
104+
spec:
105+
placementGroupName: control-plane-pg
106+
```
107+
108+
- worker `AWSMachineTemplate`:
109+
110+
- ```yaml
111+
spec:
112+
template:
113+
spec:
114+
placementGroupName: worker-pg
115+
```
116+
117+
## Best Practices
118+
119+
1. **Placement Group Types**: Choose the appropriate placement group type based on your workload:
120+
- **Cluster**: For applications requiring low latency and high throughput
121+
- **Partition**: For large distributed workloads that need fault isolation
122+
- **Spread**: For critical instances that need maximum availability
123+
124+
2. **Naming Convention**: Use descriptive names that indicate the purpose and type of the placement group
125+
126+
3. **Availability Zone**: Placement groups are constrained to a single Availability Zone, so plan your cluster topology accordingly
127+
128+
4. **Instance Types**: Some instance types have restrictions on placement groups (e.g., some bare metal instances)
129+
130+
5. **Capacity Planning**: Consider the placement group capacity limits when designing your cluster
131+
132+
## Important Notes
133+
134+
- Placement groups must be created in AWS before they can be referenced
135+
- Placement groups are constrained to a single Availability Zone
136+
- You cannot move an existing instance into a placement group
137+
- Some instance types cannot be launched in placement groups
138+
- Placement groups have capacity limits that vary by type and instance family

0 commit comments

Comments
 (0)