Skip to content

Commit 8f79ed6

Browse files
committed
Add KMS foundations in encryption controllers in library-go
1 parent 21f43f6 commit 8f79ed6

File tree

1 file changed

+392
-0
lines changed

1 file changed

+392
-0
lines changed
Lines changed: 392 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,392 @@
1+
---
2+
title: kms-encryption-foundations
3+
authors:
4+
- "@ardaguclu"
5+
- "@flavianmissi"
6+
reviewers:
7+
- "@ibihim"
8+
- "@sjenning"
9+
- "@tkashem"
10+
approvers:
11+
- "@benluddy"
12+
api-approvers:
13+
- "@JoelSpeed"
14+
creation-date: 2025-12-03
15+
last-updated: 2025-12-03
16+
tracking-link:
17+
- "https://issues.redhat.com/browse/OCPSTRAT-108"
18+
see-also:
19+
- "enhancements/kube-apiserver/encrypting-data-at-datastore-layer.md"
20+
- "enhancements/etcd/storage-migration-for-etcd-encryption.md"
21+
- "[encrypt data at rest with KMS](https://github.com/openshift/enhancements/pull/1872)"
22+
replaces:
23+
- "[KMS Encryption Provider for Etcd Secrets](https://github.com/openshift/enhancements/pull/1682/)"
24+
---
25+
26+
# KMS Encryption Foundations
27+
28+
## Summary
29+
30+
Extend OpenShift encryption controllers to support external Key Management Services (KMS) alongside existing local encryption modes (aescbc, aesgcm). This allows encryption keys to be stored and managed outside the cluster for enhanced security.
31+
32+
This enhancement:
33+
- Extends the `config.openshift.io/v1/APIServer` resource for KMS configuration
34+
- Extends encryption controllers in `openshift/library-go` to support KMS as a new encryption mode
35+
- Maintains feature parity with existing encryption modes (migration, monitoring, key rotation)
36+
- Supports AWS KMS and Vault in Tech Preview (Thales in future iterations)
37+
38+
## Motivation
39+
40+
OpenShift currently manages AES keys locally for encrypting data at rest in etcd. KMS support enables integration with external key management systems where encryption keys are stored outside the cluster, protecting against attacks where control plane nodes are compromised.
41+
42+
### Goals
43+
44+
- Support KMS as a new encryption mode in existing encryption controllers
45+
- Seamless migration between encryption modes (aescbc ↔ KMS)
46+
- Provider-agnostic controller implementation with minimal provider-specific code
47+
- Feature parity with existing modes (monitoring, migration, key rotation)
48+
49+
### Non-Goals
50+
51+
- Implementing KMS plugins (provided by upstream Kubernetes/vendors)
52+
- KMS plugin deployment/lifecycle management (separate EP for Tech Preview)
53+
- KMS plugin health checks (Tech Preview v2)
54+
- Migration between different KMS providers (separate EP for GA)
55+
- Recovery from KMS key loss (separate EP for GA)
56+
- Automatic `key_id` rotation detection (Tech Preview v2)
57+
58+
## Proposal
59+
60+
Extend the existing encryption controller framework in `openshift/library-go` to support KMS encryption through hash-based change detection. The controllers calculate a hash of the KMS configuration to detect changes and trigger re-encryption, avoiding the need for external service dependencies.
61+
62+
**Key changes:**
63+
1. Add KMS mode constant to encryption state types
64+
2. Implement hash-based detection for KMS configuration changes
65+
3. Manage empty encryption key secrets (actual keys in external KMS)
66+
4. Reuse existing migration controller (no changes needed)
67+
68+
**Tech Preview v2 additions:**
69+
- Poll KMS plugin Status endpoint for `key_id` changes in apiserver operators
70+
- Store hash of `key_id` in data field of encryption key secrets
71+
- Hash-based detection for external key rotation
72+
73+
### Workflow Description
74+
75+
#### Actors in the Workflow
76+
77+
**cluster admin** is a human user responsible for configuring and maintaining the cluster.
78+
79+
**KMS** is the external Key Management Service (AWS KMS, HashiCorp Vault, etc.) that stores and manages the Key Encryption Key (KEK).
80+
81+
**KMS plugin** is a gRPC service implementing Kubernetes KMS v2 API, running as a sidecar to API server pods. It communicates with the external KMS to encrypt/decrypt data encryption keys (DEKs).
82+
83+
**API server operator** is the OpenShift operator (kube-apiserver-operator, openshift-apiserver-operator, or authentication-operator) managing API server deployments.
84+
85+
#### Encryption Controllers
86+
87+
**keyController** manages encryption key lifecycle. Creates encryption key secrets in `openshift-config-managed` namespace. For KMS mode, creates empty secrets with KMS configuration hashes.
88+
89+
**stateController** generates EncryptionConfiguration for API server consumption. Implements distributed state machine ensuring all API servers converge to same revision. For KMS mode, generates configuration with deterministic Unix socket paths.
90+
91+
**migrationController** orchestrates resource re-encryption. Marks resources as migrated after rewriting in etcd. Works with all encryption modes including KMS.
92+
93+
**pruneController** prunes inactive encryption key secrets. Maintains N keys (currently 10) for rollback scenarios.
94+
95+
**conditionController** determines when controllers should act. Provides status conditions (`EncryptionInProgress`, `EncryptionCompleted`, `EncryptionDegraded`).
96+
97+
#### Steps for Enabling KMS Encryption
98+
99+
1. Cluster admin updates the APIServer resource:
100+
```yaml
101+
apiVersion: config.openshift.io/v1
102+
kind: APIServer
103+
spec:
104+
encryption:
105+
type: kms
106+
kms:
107+
aws:
108+
region: us-east-1
109+
keyArn: arn:aws:kms:us-east-1:123456789012:key/12345678-1234-1234-1234-123456789012
110+
```
111+
112+
2. keyController detects the new encryption mode and calculates hash of the KMS configuration.
113+
114+
3. keyController creates encryption key secret:
115+
```yaml
116+
apiVersion: v1
117+
kind: Secret
118+
metadata:
119+
name: openshift-kube-apiserver-encryption-1
120+
namespace: openshift-config-managed
121+
annotations:
122+
encryption.apiserver.operator.openshift.io/mode: "kms"
123+
encryption.apiserver.operator.openshift.io/kms-config-hash: "a1b2c3d4e5f67890"
124+
data:
125+
keys: "" # Empty in Tech Preview - KEK stored in external KMS
126+
# In Tech Preview v2, will contain base64-encoded key_id hash
127+
```
128+
129+
4. stateController generates EncryptionConfiguration with hash embedded in socket path:
130+
```yaml
131+
apiVersion: apiserver.config.k8s.io/v1
132+
kind: EncryptionConfiguration
133+
resources:
134+
- resources: [configmap]
135+
providers:
136+
- kms:
137+
name: kms-a1b2c3d4e5f67890-configmap-1
138+
endpoint: unix:///var/run/kmsplugin/kms-a1b2c3d4e5f67890.socket
139+
apiVersion: v2
140+
```
141+
The deterministic socket path allows KMS plugin lifecycle management to use the same path.
142+
143+
5. migrationController detects the new secret and initiates re-encryption (no code changes - works with any mode).
144+
145+
6. Resources are re-encrypted using KEK in external KMS via the KMS plugin.
146+
147+
7. conditionController updates status conditions: `EncryptionInProgress`, then `EncryptionCompleted`.
148+
149+
#### Variation: Configuration Changes (Key Rotation)
150+
151+
When cluster admin updates KMS configuration (e.g., new key ARN, different region):
152+
153+
1. keyController recalculates hash from updated APIServer resource.
154+
2. Compares new hash with hash in most recent encryption key secret annotation.
155+
3. If hashes differ:
156+
- Creates new encryption key secret with new hash
157+
- migrationController automatically triggers re-encryption
158+
4. If hashes match: No action.
159+
160+
**Note:** Automatic weekly key rotation (used for aescbc/aesgcm) is disabled for KMS since rotation is triggered externally.
161+
162+
#### Variation: External KMS Key Rotation (Tech Preview v2)
163+
164+
When external KMS rotates the key internally (e.g., AWS KMS automatic rotation):
165+
166+
1. keyController polls KMS plugin Status endpoint for `key_id`.
167+
2. Calculates hash of `key_id` and compares with hash in secret `Data` field.
168+
3. If `key_id` hash differs:
169+
- Creates new encryption key secret with new `key_id` hash
170+
- migrationController automatically triggers re-encryption
171+
4. If `key_id` hash matches: No action.
172+
173+
**Two hashes tracked:**
174+
- `kmsConfigHash` (annotation) - Detects admin configuration changes
175+
- `kmsKeyIDHash` (data field) - Detects external key rotation
176+
177+
Separate hashes handle scenarios where config changes without key rotation (updating Vault address) or key rotates without config changes (AWS automatic rotation).
178+
179+
#### Variation: Migration Between Encryption Modes
180+
181+
**From aescbc to KMS:**
182+
1. Admin updates APIServer: `type: kms` with KMS configuration.
183+
2. keyController creates KMS secret (empty data, with hash).
184+
3. migrationController re-encrypts resources using external KMS.
185+
186+
**From KMS to aescbc:**
187+
1. Admin updates APIServer: `type: aescbc`.
188+
2. keyController creates aescbc secret (with actual key material).
189+
3. migrationController re-encrypts resources using local AES key.
190+
191+
Migration controller reuses existing logic - no changes required.
192+
193+
### User Stories
194+
195+
- As a cluster admin, I want to enable KMS encryption by updating the APIServer resource, so I can declaratively configure encryption without manually managing keys.
196+
- As a cluster admin, I want the same migration and monitoring experience for KMS as local encryption, so I don't need to learn new procedures.
197+
- As a security admin, I want encryption keys stored outside the cluster, so compromised control plane nodes cannot access keys.
198+
199+
### API Extensions
200+
201+
**APIServer Resource** (`config.openshift.io/v1`):
202+
- Extended with KMS configuration fields ([PR #2035](https://github.com/openshift/api/pull/2035) for AWS KMS)
203+
- Vault KMS fields will be added after finalization
204+
205+
**Encryption Secret Annotations** (library-go):
206+
```go
207+
EncryptionSecretKMSConfigHash = "encryption.apiserver.operator.openshift.io/kms-config-hash"
208+
```
209+
Stores truncated hash (16 hex characters, 8 bytes) of KMS configuration for change detection.
210+
211+
**Encryption State Types** (library-go):
212+
- `KeyState` struct: Add `KMSConfigHash` field
213+
- Add `KMS` mode constant alongside `aescbc`, `aesgcm`, `identity`
214+
215+
### Topology Considerations
216+
217+
#### Hypershift / Hosted Control Planes
218+
219+
The library-go encryption controllers run in the management cluster as part of the hosted control plane operators.
220+
KMS plugin health checks must account for the split architecture where plugins may run in different contexts than the controllers.
221+
222+
#### Standalone Clusters
223+
224+
This enhancement applies to standalone clusters.
225+
The controllers run in the cluster-kube-apiserver-operator, cluster-openshift-apiserver-operator, and cluster-authentication-operator.
226+
227+
#### Single-node Deployments or MicroShift
228+
229+
Resource consumption impact is minimal - the controllers already exist and are extended with KMS-specific logic.
230+
Single-node deployments will see slightly increased CPU usage during key rotation detection (gRPC Status calls), but this is negligible.
231+
232+
MicroShift may adopt this enhancement if KMS encryption is desired, but the configuration mechanism may differ (file-based vs API resource).
233+
234+
### Implementation Details/Notes/Constraints
235+
236+
**Hash Calculation** (`pkg/operator/encryption/controllers/key_controller.go`):
237+
```go
238+
// Concatenate provider-specific fields
239+
combined := aws.KeyARN + ":" + aws.Region
240+
hash := sha256.Sum256([]byte(combined))
241+
kmsConfigHash := hex.EncodeToString(hash[:])[:16]
242+
```
243+
244+
> **Note:** The hash is truncated to 16 hex characters (8 bytes) to stay within Unix socket path length limits (typically 108 characters) while maintaining sufficient uniqueness for distinguishing different KMS configurations. This allows deterministic socket paths like `/var/run/kmsplugin/kms-a1b2c3d4e5f67890.socket`.
245+
246+
**Reverse Conversion** (stateController reads EncryptionConfiguration from API server pods):
247+
1. Extract hash from socket path: `kms-a1b2c3d4e5f67890.socket` → `a1b2c3d4e5f67890`
248+
2. Look up secret with matching `kms-config-hash` annotation
249+
3. Reconstruct KeyState with original KMS configuration
250+
251+
### Risks and Mitigations
252+
253+
**Risk: KMS Plugin Unavailable During Controller Sync**
254+
- **Impact:** Controllers cannot detect key rotation
255+
- **Mitigation:** No mitigation in Tech Preview. Tech Preview v2 will add health checks and expose it to cluster admin via operator conditions
256+
257+
**Risk: etcd Backup Restoration Without KMS Key Access**
258+
- **Impact:** Cannot decrypt data if KMS key deleted/unavailable/expired
259+
- **Mitigation:** No mitigation in Tech Preview. Document KMS key retention requirements.
260+
261+
### Drawbacks
262+
263+
- Adds complexity to encryption controllers for KMS-specific logic
264+
- AWS KMS requires config changes for rotation (not automatic)
265+
- Dependency on KMS plugin health for controller operations (health checks in Tech Preview v2)
266+
267+
## Test Plan
268+
269+
**Unit Tests:**
270+
- `key_controller_test.go`: KMS key creation, rotation detection, hash changes
271+
- `migration_controller_test.go`: KMS migration scenarios
272+
- `state_controller_test.go`: KMS state changes
273+
274+
**E2E Tests** (Future work):
275+
- Full cluster with KMS encryption enabled
276+
- Trigger external KMS key rotation
277+
- Key rotation with real KMS plugin
278+
- Migration between encryption modes (aescbc → KMS, KMS → identity)
279+
- Verify data re-encryption completes
280+
- Performance testing (time to migrate N secrets)
281+
282+
## Graduation Criteria
283+
284+
### Dev Preview -> Tech Preview
285+
286+
None
287+
288+
### Tech Preview -> GA
289+
290+
- Dynamic `key_id` fetching via KMS plugin Status endpoint
291+
- Full support for key rotation, with automated data re-encryption
292+
- Migration support between different KMS providers, with automated data re-encryption
293+
- Health check preconditions (block operations when plugin unhealthy)
294+
- Support for Thales KMS
295+
- Comprehensive integration and E2E test coverage
296+
- Production validation in multiple environments
297+
298+
### Removing a deprecated feature
299+
300+
N/A
301+
302+
## Upgrade / Downgrade Strategy
303+
304+
**Upgrade:**
305+
306+
This feature is gated by TechPreviewNoUpgrade feature gate. Upgrades are not permitted in Tech Preview.
307+
308+
In GA, encryption controllers will handle upgrades seamlessly without requiring manual intervention.
309+
310+
**Downgrade:**
311+
312+
When KMS encryption is enabled and actively used, downgrade is not supported if the previous version lacks KMS support. The API server requires access to encryption keys to decrypt resources stored in etcd.
313+
314+
To downgrade:
315+
1. Migrate from KMS to a supported encryption mode (aescbc or aesgcm or identity)
316+
2. Wait for migration to complete
317+
3. Proceed with cluster downgrade
318+
319+
## Version Skew Strategy
320+
321+
Encryption controllers run in operator pods (not nodes). Version skew concerns:
322+
- **kube-apiserver:** Must support KMS v2 API (Kubernetes 1.27+)
323+
- **library-go:** Operators must use same library-go version
324+
- **KMS plugin:** Controllers don't interact directly (operators do)
325+
326+
No special handling required.
327+
328+
## Operational Aspects of API Extensions
329+
330+
**Monitoring:**
331+
- Operator conditions: `EncryptionControllerDegraded`, `EncryptionMigrationControllerProgressing`, `KMSPluginDegraded`
332+
- Metrics: `apiserver_storage_transformation_operations_total`, `apiserver_storage_transformation_duration_seconds`
333+
334+
**Impact:**
335+
- API latency: +10-50ms per operation (KMS call required, mitigated by DEK caching)
336+
- API throughput: <5% reduction under normal conditions
337+
338+
### Failure Modes
339+
340+
**KMS Plugin Unavailable:**
341+
- New resource creation fails
342+
- Existing resources readable (if DEKs remain cached in API server memory; cache clears on restart)
343+
- Detection: `KMSPluginDegraded=True`
344+
- Recovery: Plugin restart (automatic or manual)
345+
346+
**Invalid KMS Configuration:**
347+
- Plugin fails to start
348+
- Detection: Plugin container crash loops
349+
- Recovery: Fix APIServer configuration
350+
351+
**Key Rotation Stuck:**
352+
- Migration unable to complete
353+
- Detection: `EncryptionMigrationControllerProgressing=True` for extended period
354+
- Recovery: Check migration controller logs, verify KMS health
355+
356+
## Support Procedures
357+
358+
### Detecting KMS Rotation Issues
359+
```bash
360+
# Check encryption key secrets
361+
oc get secrets -n openshift-config-managed -l encryption.apiserver.operator.openshift.io/component=encryption-key
362+
363+
# Check controller logs
364+
oc logs -n openshift-kube-apiserver-operator deployment/kube-apiserver-operator | grep -i kms
365+
```
366+
367+
### Disabling KMS Encryption
368+
369+
1. Update APIServer: `spec.encryption.type: "aescbc"`
370+
2. Wait for migration to complete
371+
3. KMS plugin pods removed by operators
372+
373+
**etcd Backup/Restore:**
374+
- Before backup: Document KMS configuration, verify key availability
375+
- Before restore: Verify KMS key accessible, credentials valid
376+
- Critical: Deleting KMS key makes backups unrestorable
377+
378+
## Alternatives (Not Implemented)
379+
380+
### Alternative: Separate KMS-Specific Controllers
381+
382+
Instead of extending existing controllers, create new KMS-only controllers.
383+
384+
**Why not chosen:**
385+
- Code duplication (migration logic, state management)
386+
- User confusion (different controllers for different encryption types)
387+
- More operational burden (additional monitoring, alerts)
388+
389+
390+
## Infrastructure Needed
391+
392+
None - extends existing library-go code.

0 commit comments

Comments
 (0)