Skip to content

Commit 1794054

Browse files
committed
Add KMS foundations in encryption controllers in library-go
1 parent 21f43f6 commit 1794054

File tree

1 file changed

+361
-0
lines changed

1 file changed

+361
-0
lines changed
Lines changed: 361 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,361 @@
1+
---
2+
title: kms-encryption-foundations
3+
authors:
4+
- "@ardaguclu"
5+
- "@flavianmissi"
6+
reviewers:
7+
- "@ibihim"
8+
- "@sjenning"
9+
- "@tkashem"
10+
approvers:
11+
- "@benluddy"
12+
api-approvers:
13+
- "@JoelSpeed"
14+
creation-date: 2025-12-03
15+
last-updated: 2025-12-04
16+
tracking-link:
17+
- "https://issues.redhat.com/browse/OCPSTRAT-108"
18+
see-also:
19+
- "enhancements/kube-apiserver/encrypting-data-at-datastore-layer.md"
20+
- "enhancements/etcd/storage-migration-for-etcd-encryption.md"
21+
- "[encrypt data at rest with KMS](https://github.com/openshift/enhancements/pull/1872)"
22+
replaces:
23+
- "[KMS Encryption Provider for Etcd Secrets](https://github.com/openshift/enhancements/pull/1682/)"
24+
---
25+
26+
# KMS Encryption Foundations
27+
28+
## Summary
29+
30+
Extend OpenShift encryption controllers to support external Key Management Services (KMS) alongside existing local encryption modes (aescbc, aesgcm). This allows encryption keys to be stored and managed outside the cluster for enhanced security.
31+
32+
This enhancement:
33+
- Extends the `config.openshift.io/v1/APIServer` resource for KMS configuration
34+
- Extends encryption controllers in `openshift/library-go` to support KMS as a new encryption mode
35+
- Maintains feature parity with existing encryption modes (migration, monitoring, key rotation)
36+
- Supports AWS KMS and Vault in Tech Preview (Thales in future iterations)
37+
38+
## Motivation
39+
40+
OpenShift currently manages AES keys locally for encrypting data at rest in etcd. KMS support enables integration with external key management systems where encryption keys are stored outside the cluster, protecting against attacks where control plane nodes are compromised.
41+
42+
### Goals
43+
44+
- Support KMS as a new encryption mode in existing encryption controllers
45+
- Seamless migration between encryption modes (aescbc ↔ KMS)
46+
- Provider-agnostic controller implementation with minimal provider-specific code
47+
- Feature parity with existing modes (monitoring, migration, key rotation)
48+
49+
### Non-Goals
50+
51+
- Implementing KMS plugins (provided by upstream Kubernetes/vendors)
52+
- KMS plugin deployment/lifecycle management (separate EP for Tech Preview)
53+
- KMS plugin health checks (Tech Preview v2)
54+
- Migration between different KMS providers (separate EP for GA)
55+
- Recovery from KMS key loss (separate EP for GA)
56+
- Automatic `key_id` rotation detection (Tech Preview v2)
57+
58+
## Proposal
59+
60+
Extend the existing encryption controller framework in `openshift/library-go` to support KMS encryption through hash-based change detection. The controllers calculate a hash of the KMS configuration to detect changes and trigger re-encryption, avoiding the need for external service dependencies.
61+
62+
**Key changes:**
63+
1. Add KMS mode constant to encryption state types
64+
2. Implement hash-based detection for KMS configuration changes
65+
3. Manage empty encryption key secrets (actual keys in external KMS)
66+
4. Reuse existing migration controller (no changes needed)
67+
68+
**Tech Preview v2 additions:**
69+
- Poll KMS plugin Status endpoint for `key_id` changes in apiserver operators
70+
- Store hash of `key_id` in data field of encryption key secrets
71+
- Hash-based detection for external key rotation
72+
73+
### Actors in the Workflow
74+
75+
**cluster admin** is a human user responsible for configuring and maintaining the cluster.
76+
77+
**KMS** is the external Key Management Service (AWS KMS, HashiCorp Vault, etc.) that stores and manages the Key Encryption Key (KEK).
78+
79+
**KMS plugin** is a gRPC service implementing Kubernetes KMS v2 API, running as a sidecar to API server pods. It communicates with the external KMS to encrypt/decrypt data encryption keys (DEKs).
80+
81+
**API server operator** is the OpenShift operator (kube-apiserver-operator, openshift-apiserver-operator, or authentication-operator) managing API server deployments.
82+
83+
#### Encryption Controllers
84+
85+
**keyController** manages encryption key lifecycle. Creates encryption key secrets in `openshift-config-managed` namespace. For KMS mode, creates empty secrets with KMS configuration hashes.
86+
87+
**stateController** generates EncryptionConfiguration for API server consumption. Implements distributed state machine ensuring all API servers converge to same revision. For KMS mode, generates configuration with deterministic Unix socket paths.
88+
89+
**migrationController** orchestrates resource re-encryption. Marks resources as migrated after rewriting in etcd. Works with all encryption modes including KMS.
90+
91+
**pruneController** prunes inactive encryption key secrets. Maintains N keys (currently 10) for rollback scenarios.
92+
93+
**conditionController** determines when controllers should act. Provides status conditions (`EncryptionInProgress`, `EncryptionCompleted`, `EncryptionDegraded`).
94+
95+
### Steps for Enabling KMS Encryption
96+
97+
1. Cluster admin updates the APIServer resource:
98+
```yaml
99+
apiVersion: config.openshift.io/v1
100+
kind: APIServer
101+
spec:
102+
encryption:
103+
type: kms
104+
kms:
105+
aws:
106+
region: us-east-1
107+
keyArn: arn:aws:kms:us-east-1:123456789012:key/12345678-1234-1234-1234-123456789012
108+
```
109+
110+
2. keyController detects the new encryption mode and calculates hash of the KMS configuration.
111+
112+
3. keyController creates encryption key secret:
113+
```yaml
114+
apiVersion: v1
115+
kind: Secret
116+
metadata:
117+
name: openshift-kube-apiserver-encryption-1
118+
namespace: openshift-config-managed
119+
annotations:
120+
encryption.apiserver.operator.openshift.io/mode: "kms"
121+
encryption.apiserver.operator.openshift.io/kms-config-hash: "a1b2c3d4e5f67890"
122+
data:
123+
keys: "" # Empty in Tech Preview - KEK stored in external KMS
124+
# In Tech Preview v2, will contain base64-encoded key_id hash
125+
```
126+
127+
4. stateController generates EncryptionConfiguration with hash embedded in socket path:
128+
```yaml
129+
apiVersion: apiserver.config.k8s.io/v1
130+
kind: EncryptionConfiguration
131+
resources:
132+
- resources: [secrets]
133+
providers:
134+
- kms:
135+
name: kms-a1b2c3d4e5f67890-1-configmap
136+
endpoint: unix:///var/run/kmsplugin/kms-a1b2c3d4e5f67890.socket
137+
apiVersion: v2
138+
```
139+
The deterministic socket path allows KMS plugin lifecycle management to use the same path.
140+
141+
5. migrationController detects the new secret and initiates re-encryption (no code changes - works with any mode).
142+
143+
6. Resources are re-encrypted using KEK in external KMS via the KMS plugin.
144+
145+
7. conditionController updates status conditions: `EncryptionInProgress`, then `EncryptionCompleted`.
146+
147+
### Variation: Configuration Changes (Key Rotation)
148+
149+
When cluster admin updates KMS configuration (e.g., new key ARN, different region):
150+
151+
1. keyController recalculates hash from updated APIServer resource.
152+
2. Compares new hash with hash in most recent encryption key secret annotation.
153+
3. If hashes differ:
154+
- Creates new encryption key secret with new hash
155+
- migrationController automatically triggers re-encryption
156+
4. If hashes match: No action.
157+
158+
**Note:** Automatic weekly key rotation (used for aescbc/aesgcm) is disabled for KMS since rotation is triggered externally.
159+
160+
### Variation: External KMS Key Rotation (Tech Preview v2)
161+
162+
When external KMS rotates the key internally (e.g., AWS KMS automatic rotation):
163+
164+
1. keyController polls KMS plugin Status endpoint for `key_id`.
165+
2. Calculates hash of `key_id` and compares with hash in secret `Data` field.
166+
3. If `key_id` hash differs:
167+
- Creates new encryption key secret with new `key_id` hash
168+
- migrationController automatically triggers re-encryption
169+
4. If `key_id` hash matches: No action.
170+
171+
**Two hashes tracked:**
172+
- `kmsConfigHash` (annotation) - Detects admin configuration changes
173+
- `kmsKeyIDHash` (data field) - Detects external key rotation
174+
175+
Separate hashes handle scenarios where config changes without key rotation (updating Vault address) or key rotates without config changes (AWS automatic rotation).
176+
177+
### Variation: Migration Between Encryption Modes
178+
179+
**From aescbc to KMS:**
180+
1. Admin updates APIServer: `type: kms` with KMS configuration.
181+
2. keyController creates KMS secret (empty data, with hash).
182+
3. migrationController re-encrypts resources using external KMS.
183+
184+
**From KMS to aescbc:**
185+
1. Admin updates APIServer: `type: aescbc`.
186+
2. keyController creates aescbc secret (with actual key material).
187+
3. migrationController re-encrypts resources using local AES key.
188+
189+
Migration controller reuses existing logic - no changes required.
190+
191+
### User Stories
192+
193+
- As a cluster admin, I want to enable KMS encryption by updating the APIServer resource, so I can declaratively configure encryption without manually managing keys.
194+
- As a cluster admin, I want the same migration and monitoring experience for KMS as local encryption, so I don't need to learn new procedures.
195+
- As a security admin, I want encryption keys stored outside the cluster, so compromised control plane nodes cannot access keys.
196+
197+
### API Extensions
198+
199+
**APIServer Resource** (`config.openshift.io/v1`):
200+
- Extended with KMS configuration fields ([PR #2035](https://github.com/openshift/api/pull/2035) for AWS KMS)
201+
- Vault KMS fields will be added after finalization
202+
203+
**Encryption Secret Annotations** (library-go):
204+
```go
205+
EncryptionSecretKMSConfigHash = "encryption.apiserver.operator.openshift.io/kms-config-hash"
206+
```
207+
Stores truncated hash (16 hex characters, 8 bytes) of KMS configuration for change detection.
208+
209+
**Encryption State Types** (library-go):
210+
- `KeyState` struct: Add `KMSConfigHash` field
211+
- Add `KMS` mode constant alongside `aescbc`, `aesgcm`, `identity`
212+
213+
### Implementation Details/Notes/Constraints
214+
215+
**Hash Calculation** (`pkg/operator/encryption/controllers/key_controller.go`):
216+
```go
217+
// Concatenate provider-specific fields
218+
combined := aws.KeyARN + ":" + aws.Region
219+
hash := sha256.Sum256([]byte(combined))
220+
kmsConfigHash := hex.EncodeToString(hash[:])[:16]
221+
```
222+
223+
> **Note:** The hash is truncated to 16 hex characters (8 bytes) to stay within Unix socket path length limits (typically 108 characters) while maintaining sufficient uniqueness for distinguishing different KMS configurations. This allows deterministic socket paths like `/var/run/kmsplugin/kms-a1b2c3d4e5f67890.socket`.
224+
225+
**Reverse Conversion** (stateController reads EncryptionConfiguration from API server pods):
226+
1. Extract hash from socket path: `kms-a1b2c3d4e5f67890.socket` → `a1b2c3d4e5f67890`
227+
2. Look up secret with matching `kms-config-hash` annotation
228+
3. Reconstruct KeyState with original KMS configuration
229+
230+
### Risks and Mitigations
231+
232+
**Risk: KMS Plugin Unavailable During Controller Sync**
233+
- **Impact:** Controllers cannot detect key rotation
234+
- **Mitigation:** No mitigation in Tech Preview. Tech Preview v2 will add health checks and expose it to cluster admin via operator conditions
235+
236+
**Risk: etcd Backup Restoration Without KMS Key Access**
237+
- **Impact:** Cannot decrypt data if KMS key deleted/unavailable/expired
238+
- **Mitigation:** No mitigation in Tech Preview. Document KMS key retention requirements.
239+
240+
### Drawbacks
241+
242+
- Adds complexity to encryption controllers for KMS-specific logic
243+
- AWS KMS requires config changes for rotation (not automatic in Tech Preview)
244+
- Dependency on KMS plugin health for controller operations (health checks in Tech Preview v2)
245+
246+
## Design Details
247+
248+
### Test Plan
249+
250+
**Unit Tests:**
251+
- `key_controller_test.go`: KMS key creation, rotation detection, hash changes
252+
- `migration_controller_test.go`: KMS migration scenarios
253+
- `state_controller_test.go`: KMS state changes
254+
255+
**E2E Tests** (Future work):
256+
- Full cluster with KMS encryption enabled
257+
- Trigger external KMS key rotation
258+
- Key rotation with real KMS plugin
259+
- Migration between encryption modes (aescbc → KMS, KMS → identity)
260+
- Verify data re-encryption completes
261+
- Performance testing (time to migrate N secrets)
262+
263+
## Graduation Criteria
264+
265+
### Dev Preview -> Tech Preview
266+
267+
None
268+
269+
### Tech Preview -> GA
270+
271+
- Dynamic `key_id` fetching via KMS plugin Status endpoint
272+
- Full key rotation support for external KMS changes
273+
- Health check preconditions (block operations when plugin unhealthy)
274+
- Support for Thales KMS
275+
- Comprehensive integration and E2E test coverage
276+
- Production validation in multiple environments
277+
278+
### Removing a deprecated feature
279+
280+
N/A
281+
282+
## Upgrade / Downgrade Strategy
283+
284+
**Upgrade:** Gated by feature gate (TechPreviewNoUpgrade only).
285+
286+
**Downgrade:** Not supported.
287+
288+
## Version Skew Strategy
289+
290+
Encryption controllers run in operator pods (not nodes). Version skew concerns:
291+
- **kube-apiserver:** Must support KMS v2 API (Kubernetes 1.27+)
292+
- **library-go:** Operators must use same library-go version
293+
- **KMS plugin:** Controllers don't interact directly (operators do)
294+
295+
No special handling required.
296+
297+
### Operational Aspects of API Extensions
298+
299+
**Monitoring:**
300+
- Operator conditions: `EncryptionControllerDegraded`, `EncryptionMigrationControllerProgressing`, `KMSPluginDegraded`
301+
- Metrics: `apiserver_storage_transformation_operations_total`, `apiserver_storage_transformation_duration_seconds`
302+
303+
**Impact:**
304+
- API latency: +10-50ms per operation (KMS call required, mitigated by DEK caching)
305+
- API throughput: <5% reduction under normal conditions
306+
307+
### Failure Modes
308+
309+
**KMS Plugin Unavailable:**
310+
- New resource creation fails
311+
- Existing resources readable (DEKs cached)
312+
- Detection: `KMSPluginDegraded=True`
313+
- Recovery: Plugin restart (automatic or manual)
314+
315+
**Invalid KMS Configuration:**
316+
- Plugin fails to start
317+
- Detection: Plugin container crash loops
318+
- Recovery: Fix APIServer configuration
319+
320+
**Key Rotation Stuck:**
321+
- Migration unable to complete
322+
- Detection: `EncryptionMigrationControllerProgressing=True` for extended period
323+
- Recovery: Check migration controller logs, verify KMS health
324+
325+
## Support Procedures
326+
327+
### Detecting KMS Rotation Issues
328+
```bash
329+
# Check encryption key secrets
330+
oc get secrets -n openshift-config-managed -l encryption.apiserver.operator.openshift.io/component=encryption-key
331+
332+
# Check controller logs
333+
oc logs -n openshift-kube-apiserver-operator deployment/kube-apiserver-operator | grep -i kms
334+
```
335+
336+
### Disabling KMS Encryption
337+
338+
1. Update APIServer: `spec.encryption.type: "aescbc"`
339+
2. Wait for migration to complete
340+
3. KMS plugin pods removed by operators
341+
342+
**etcd Backup/Restore:**
343+
- Before backup: Document KMS configuration, verify key availability
344+
- Before restore: Verify KMS key accessible, credentials valid
345+
- Critical: Deleting KMS key makes backups unrestorable
346+
347+
## Alternatives (Not Implemented)
348+
349+
### Alternative: Separate KMS-Specific Controllers
350+
351+
Instead of extending existing controllers, create new KMS-only controllers.
352+
353+
**Why not chosen:**
354+
- Code duplication (migration logic, state management)
355+
- User confusion (different controllers for different encryption types)
356+
- More operational burden (additional monitoring, alerts)
357+
358+
359+
## Infrastructure Needed
360+
361+
None - extends existing library-go code.

0 commit comments

Comments
 (0)