Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
260 changes: 260 additions & 0 deletions enhancements/olm/service-account-deprecation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,260 @@
---
title: serviceaccount-field-deprecation
authors:
- "@rashmigottipati"
reviewers:
- "@grokspawn"
- "@trgeiger"
- "@joelanford"
approvers:
- "@joelanford"
api-approvers:
- "@everettraven"

creation-date: 2025-10-08
last-updated: 2025-10-20
status: implementable
---

# Service Account Field Deprecation

## Release Signoff Checklist

- [ ] Enhancement is `implementable`
- [ ] Design details are appropriately documented from clear requirements
- [ ] Test plan is defined
- [ ] Graduation criteria for dev preview, tech preview, GA
- [ ] User-facing documentation is created in [openshift-docs](https://github.com/openshift/openshift-docs/)

## Summary

Deprecate the `.spec.serviceAccount` field from the ClusterExtension API in the operator-controller. This field was originally introduced to enforce least privilege by requiring users to provide a ServiceAccount with the necessary RBAC permissions to manage extension content. This proposal removes that requirement and simplifies controller logic and behavior.

## Motivation

The original intent of `.spec.serviceAccount` was to support a least-privilege model by allowing users to provide custom `ServiceAccount` with fine-grained permissions. In practice, this design introduced considerable operational and technical complexity, including:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd understood that it also arose from the need to implement a rollout revision approach (using boxcutter) which required cluster admin permissions anyway, so we couldn't reasonably reduce the privileges of a "package manager" from cluster admin levels.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you specifically need to use boxcutter for a revision rollout approach?

What are the limitations in doing revision rollouts, regardless of underlying implementation, that makes it impossible for you to continue to adhere to the least-privilege principle?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The primary reason for changing this is that is doesn't deliver on our 'secure by default' promise. Any person with write access to the ClusterExtension API would be able to specify any ServiceAccount, which opens the door to privilege escalation.

See "Need" section in RFC: https://docs.google.com/document/d/1qqzPtAHB1gnv00mZB7U0h0Om9ELyPSZMqsIUUNlIZOY/
and EP: #1860

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Write access to the ClusterExtension API seems like it should be a cluster-admin level permission. If someone has assigned that permission to someone they do not trust, they have shot themselves in the foot.

Accessing the cluster and being able to write to the ClusterExtension API is a difficult task when configured correctly.

The way I interpret 'secure by default' in the context of OLM is that OLM will not make it easy to leverage what it does on the cluster as an attack vector (i.e automatic upgrades, installing anything specified in a bundle, etc.).

- Token acquisition, rotation, and error handling
- Custom `rest.Config` generation and dynamic HTTP clients
- Token expiration edge cases
Comment on lines +37 to +39
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you actively experiencing issues with being able to handle these scenarios?

From what I remember this logic is all already implemented and is flexed during testing. I'm curious how many times the logic to do these things have been problematic from both an end-user perspective and a maintenance perspective.


Given the limited benefit and high complexity of this approach, we propose simplifying the model:
Copy link
Contributor

@everettraven everettraven Oct 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be helpful if you could clarify what the "limited benefit" here is.

You start this section talking about how the intention was originally for following a least-privilege principle which, on the surface, sounds like it would actually be a pretty big benefit.

After all, looking at the documentation for OLMv1 it seems like "secure by default" is a core tenant tenet: https://github.com/operator-framework/operator-controller/blob/292c0dbc236747e57f77fd914830a865e87fad6b/docs/project/olmv1_design_decisions.md?plain=1#L179-L186

As a reviewer, I'd like to better understand:

  • Why this tenant tenet was originally seen as a win for end users
  • Why this tenant tenet turned out to not be a win for end users


- The operator-controller will **no longer impersonate the provided ServiceAccount**.
- Instead, it will use **its own identity** (via its default `ServiceAccount` assigned to the controller) for all API calls and reconciliation.
- `.spec.serviceAccount` will remain **optional but ignored**, and marked as deprecated with the intent to remove the field entirely in the future.
Comment on lines +43 to +45
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, deprecated fields should still function as-is until they are fully removed.

You may have existing users that are expecting that behavior to function in that manner and changing that behavior out from under them is a breaking change.



### Goals

- Make `.spec.serviceAccount` field **optional** in the `ClusterExtension` API.
- Update the controller to **ignore** the field during reconciliation.
- Log a warning when `.spec.serviceAccount` is set.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to adopt an approach similar to how k8s tracks deprecation (see: https://kubernetes.io/blog/2020/09/03/warnings/#metrics). For example, we could implement a clusterextension_requested_deprecated_api metric, or an admission webhook to warn when we try to create a ClusterExtension with a serviceaccount.

- Provide a deprecation and removal plan.

### Non-Goals

- Replacing `serviceAccount` with another RBAC mechanism
- Managing permissions or pre-flight RBAC validation

## Proposal

### API Changes

Update the `ClusterExtension` API to mark `.spec.serviceAccount` as:

- Optional
- Deprecated via struct tags and documentation
- Marked as deprecated in the CRD schema using OpenAPI extensions:
- x-kubernetes-deprecated: true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this exists?

As far as I am aware, there is not yet a way to mark a specific field in a CRD schema as deprecated. https://pkg.go.dev/k8s.io/apiextensions-apiserver/pkg/apis/apiextensions#JSONSchemaProps would be your canonical source of things you can set for a particular schema property.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like maybe it hasn't landed yet: kubernetes/kubernetes#131817
So the only option is to comment that these are deprecated (in addition to programmatic capture of "a deprecated thing is being used").

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could include a validating admission policy that issues a warning on admission of ClusterExtension resources that have that field set.

- Include appropriate description to warn users the field is ignored

Also, update CRD validation schema accordingly.
- This will be done via OpenAPI `x-kubernetes-deprecated: true` annotation in the CRD.

**Example:**

```go
// Before (current)
type ClusterExtensionSpec struct {
...
ServiceAccount ClusterExtensionServiceAccount `json:"serviceAccount"`
...
}
```

```go
// After (deprecated and optional)
type ClusterExtensionSpec struct {
...
// Deprecated: This field is ignored and will be removed in a future release.
// +optional
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you care about breaking older clients that are readers of your API that would have expected that the serviceAccount field is always set?

If you want to maintain compatibility with older clients that read your API you'd need to default this field as well to a valid default value.

// +kubebuilder:validation:XValidation:message="serviceAccount is deprecated and ignored"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe this is a valid marker. I'm pretty certain that a rule is required and if it fails validation will not be a warning but instead will block admission of the resource - which is not something I think you want to do here.

ServiceAccount *ClusterExtensionServiceAccount `json:"serviceAccount,omitempty"`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We considered moving this to a pointer type to be a potentially breaking change we could avoid, so we're using go1.24's omitzero capability to omit this field from the struct if it is set to only 'default' values: https://github.com/operator-framework/operator-controller/pull/2242/files#diff-387659018fbee6e27fcf5ab561b45489e14b62ae73ecf00fed3a0f4c017714d1R75

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beforehand, was an empty string ("") a valid value here?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. It would've failed the DNS1123 label check.

...
}
```

**YAML Example:**

```yaml
# Before (serviceAccount was required)
apiVersion: olm.operatorframework.io/v1alpha1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a GA'd and stable API right?

Suggested change
apiVersion: olm.operatorframework.io/v1alpha1
apiVersion: olm.operatorframework.io/v1

kind: ClusterExtension
metadata:
name: argocd-extension
spec:
installNamespace: argocd
packageName: argocd-operator
version: 0.6.0
serviceAccount:
name: argocd-installer
```

```yaml
# After (field is deprecated, optional, and therefore ignored and removed)
apiVersion: olm.operatorframework.io/v1alpha1
kind: ClusterExtension
metadata:
name: argocd-extension
spec:
installNamespace: argocd
packageName: argocd-operator
version: 0.6.0
# serviceAccount is now deprecated and has no effect
```

### Controller Logic Changes

- Remove all logic that:
- Token Acquisition: Eliminate use of TokenRequest API to fetch short-lived tokens for user-provided ServiceAccounts.
- Rest Config Mapping: Remove the ServiceAccountRestConfigMapper, which dynamically generated rest.Config objects for impersonation.
- Synthetic Permissions: Remove conditional logic for SyntheticPermissions that depended on impersonated clients.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do "Synthetic Permissions" play a role in service account based permissions? What are "Synthetic Permissions"?

- The controller now uses a static `rest.Config` created from its own identity (its default ServiceAccount).
- The RestConfigMapper will now be directly set to `ClusterAdminRestConfigMapper(mgr.GetConfig())`, i.e. the controller always uses its own identity (cluster-admin level config) for all reconciliation and watching operations.
This config is passed to the helm.ActionConfigGetter, ensuring that all Helm operations (install, upgrade, uninstall) use the same identity.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to include the removal of the preflight permissions checks here, or is that not required because those are not GA'd?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should remove the preflight checks as well. We can do that any time after we give OLM cluster-admin and change the various clients to use its permissions instead of the CE's SA's permissions.

- Log a deprecation warning if `.spec.serviceAccount` is set
```
[DEPRECATION] 'spec.serviceAccount' is specified in ClusterExtension 'foo', but is ignored and will be removed in a future release.
```

## Design Details

### Test Plan

To validate the safe deprecation and eventual removal of the `.spec.serviceAccount` field, the following implementation and testing steps will be followed.

#### Unit Tests
- Remove tests that exercised impersonation logic and ServiceAccount specific behavior.
- Add new tests to confirm:
- The `.spec.serviceAccount` field is ignored during reconciliation.
- A warning is logged when the field is set.
- Default behavior uses the controller’s own identity.

#### E2E Tests
- Update E2E tests to:
- Deploy `ClusterExtension` resources with and without the deprecated field.
- Verify that:
- Reconciliation succeeds in both cases.
- Deprecation warnings appear in controller logs.

#### Upgrade Tests
- Add upgrade test scenarios to validate:
- CRD schema updates (required -> optional -> removed) behave safely and correctly across releases.
- Manifests with the deprecated field do not block upgrades to 4.21 or 4.22.
- Manifests with the field are rejected cleanly in 4.23 (removal release).
- No reconciliation failures occur due to field removal when manifests are properly cleaned.

### Graduation Criteria / Deprecation Plan

We will deprecate the .spec.serviceAccount field over the course of three OpenShift releases following the kubernetes' deprecation policy:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's where we really need arch guidance. We've been informed that GA'd OCP APIs have functionally never been removed, even though nominally we could -- after a reasonable period -- up-version the API and remove the field, including a conversion webhook to handle upgrades.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% sure what our process here for an up-version of an API is in OpenShift, but if there is one, I imagine it wouldn't be too dissimilar from Kubernetes (https://kubernetes.io/docs/reference/using-api/deprecation-policy/#rest-resources-aka-api-objects).

You cannot remove the API version until it has "aged out" - in which case you are confident that all users have switched over to the newer, non-deprecated API. In the upstream case it looks like release N+9. You also must have a field in the next version of the API to map the old field value to handle round-tripping without data loss.

As far as I know, OpenShift has never removed a stable API version so I would default to not being able to do this.

@JoelSpeed may have more familiarity with whether or not this is something we might have a way to allow.

I know OLM is an upstream project that we sync the APIs downstream, and we generally aren't as restrictive there because we don't entirely control the API surface, but IIUC we consider OLM a core component of the OpenShift payload so I would expect that we end up treating it with a bit more caution.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are currently discussing the process for removing fields in openshift, but only at major version boundaries, I have more questions before I can provide a suggested path here, see my other comments


OpenShift 4.20 (current release):
- No changes. The .spec.serviceAccount field continues to be a required field and function as it currently does.

OpenShift 4.21: (next release - N)
- Mark the field as optional in the API.
- The controller ignores the field entirely.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this change the behaviour for an end user? If they want to leverage the older behaviour still, are they able to do so? Is there a replacement field?

- Log a warning if it is set, to alert users.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logs aren't visible to users, use a VAP set to Warn


OpenShift 4.22: (N+1)
- The field remains deprecated and ignored. The controller continues to log a warning if it is used.

OpenShift 4.23: (N+2)
- Remove all internal references and usage of the field.
- Remove the field from the API and CRD definition.
**Note**: Any usage of the field in manifests will cause validation errors.
Comment on lines +184 to +187
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are you planning to enforce that the field is not present in any instances of the resource prior to upgrading?


This phased approach, over the course of three releases, provides notice and a clear migration path for users to remove usage of the deprecated field safely. It gives users time to adjust and avoid disruption.

### Upgrade / Downgrade Strategy

**Upgrading to OpenShift 4.21 (deprecation release):**
- The `.spec.serviceAccount` field becomes optional and is ignored by the controller.
- A warning is logged if the field is set.
- The controller will stop using the `.spec.serviceAccount` field, but all other functionality continues to work as before.
- If you relied on impersonation, make sure the controller’s ServiceAccount has the necessary permissions.

**Upgrading to OpenShift 4.23 (removal release):**
- The `.spec.serviceAccount` field will no longer be present in the API or CRD.
- Manifests including this field will be rejected during validation.
- Ensure your manifests no longer include this field before upgrading.

**Downgrading from 4.23 to 4.22 or earlier:**
- Downgrades may fail or cause issues if manifests rely on a CRD that no longer includes the field.
- You must remove all usage of `.spec.serviceAccount` from existing resources before downgrading to a release that expects the field to exist in the schema.

**Downgrading from 4.22 to 4.21 or 4.20:**
- The field still exists in the API and CRD, so downgrades are safe.
- However, any functionality tied to impersonation will not resume unless controller logic is reverted or adjusted accordingly.
Comment on lines +208 to +210
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would a downgrade handle the case where the field was previously required but is now optional and as such may not be present?


### Risks and Mitigations

Deprecating and ignoring the `.spec.serviceAccount` field introduces potential risks, particularly for users who have built workflows or assumptions around impersonation-based reconciliation. Below are the primary risks, along with mitigations.

#### Risk: Unexpected Behavior for Users Relying on SA

Some users currently set `.spec.serviceAccount` assuming the controller will impersonate that ServiceAccount during reconciliation. Changing this behavior without notice could break their expectations around RBAC scopes and permissions, for example: restricting access to specific namespaces or resources.

#### Mitigation:
The controller will log a clear deprecation warning whenever a ClusterExtension includes the serviceAccount field, indicating it is now ignored and will be removed in a future release. Additionally, the field will be marked as deprecated in the CRD schema and documentation to make this clear during resource creation and review. Migration instructions will be provided in the release notes to assist users with updating their configurations.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logs are not visible.

Once it's ignored, what else has the potential to break? Does OLM now just have super user powers?


---

#### Risk: Broader Permissions Required for Controller’s ServiceAccount
By removing per-resource impersonation and falling back to the controller’s default identity, the controller must operate with a broader set of permissions. This potentially violates the principle of least privilege, since the controller’s ServiceAccount may now need to access resources across multiple namespaces or API groups on behalf of all managed ClusterExtensions.

#### Mitigation:
Although this centralizes privileges, the controller’s ServiceAccount is cluster-scoped and managed by cluster administrators. Its permissions can be restricted and audited through standard Kubernetes RBAC policies, providing clearer and simpler management compared to multiple impersonated identities. This approach aligns with common practices used by other Kubernetes controllers.

---

#### Risk: Potential Breaking Change for Existing Users

Some users may have been relying on the controller to impersonate the specified serviceAccount during reconciliation. Eventually removing support for this behavior may lead to unexpected changes in how permissions are applied, especially if users were using the field to restrict access.

#### Mitigation:
To ensure a smooth transition, this change will follow a deprecation process. The field will remain in the API but will be ignored by the controller. A clear warning will be logged when the field is used. After multiple releases, the field will be removed entirely from the API and CRD. This timeline gives users enough time to adapt their configurations and permissions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the risk of an automatic upgrade going through that performs privilege escalation of an installed operator?

If that automatic upgrade happens to upgrade to a version that has some malicious code in it, giving OLM cluster admin means that it will stamp out whatever the installation manifests require and could lead to a direct injection of a malicious actor into a production environment.

## Implementation History
- https://github.com/operator-framework/operator-controller/pull/2242

## Drawbacks
- Users lose fine-grained permission control per extension via separate ServiceAccounts.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this something we are ok with?

- The controller’s ServiceAccount needs broader permissions, potentially increasing risk if compromised.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, but there are likely other ServiceAccounts on the cluster that already have broad enough permissions to do damage. To do that damage you have to first get access to the cluster.

This could certainly be considered a drawback to increase the permissions associated with the OLM Service Account, but I think a bigger drawback I don't see explicitly mentioned here is the potential for OLM to be the thing that injected a malicious actor into the cluster because it no longer will fail to stamp out permissions it wasn't explicitly told it could.

While it may still be difficult, we've seen in the past malicious actors sneak something into a popular project that is then shipped out en masse. With this change, OLM could be used to get that access to a cluster.

- Cluster administrators must carefully manage the controller’s permissions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do they carefully manage the controller's permissions? You are dictating that it must be cluster admin.

- Possible security concerns when consolidating privileges into a single SA as opposed to separate, per-extension identities.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lost signal if an operator upgrade requires permissions not provided by the existing, configured serviceaccount (since OLMv1 can do anything, we might perpetrate the upgrade and then have the operator fail).

## Alternatives (Not Implemented)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some alternatives I've not seen mentioned as being considered. Curious if you've thought of them:

  • Building better client tooling that makes managing permission requirements easier for users looking to install cluster extensions.
  • Building a permission requesting system that makes it easier for users to request installation of a cluster extension and OLM creating a "permission request" that an admin can either approve or deny to automatically stamp out the RBAC needed to complete the installation.


1. Keep `.spec.serviceAccount` and Improve Token Handling
Instead of removing the field, we could improve token management (like automatic refresh and better error handling).
- However, this would add more complexity and maintenance work without fixing the main issue: impersonation is complicated and rarely helpful.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading through this, I'm not convinced on the "rarely helpful" part. Please do update this document appropriately to explicitly state why this has been found to be rarely helpful.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

In a world where OLM has super user powers, it can install and change anything on the cluster, that's somewhat expected given the role it plays.

But, what if I, as an end user decide, ok, I'm installing this bundle, but to avoid potential accidents, I want OLM to install this with a more restrictive set of permissions?

It seems removing this field removes that ability for end users, I don't see why that isn't beneficial?


2. Create a New RBAC System for Scoped Permissions
Build a new way to control permissions per extension without using impersonation or ServiceAccounts (for example, by referencing ClusterRoles directly).
- This would make the API more complex, go against common Kubernetes practices, and could confuse users.

3. Do Nothing – Keep Supporting the Field as Is
- The added complexity and risks from impersonation outweigh the benefits, so leaving it unchanged is not a good option.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading through this, I still don't have a solid understanding of how painful this added complexity and risks of impersonation actually are