Skip to content

Conversation

@rashmigottipati
Copy link
Member

This PR proposes the deprecation of the serviceAccount field in the OLMv1 ClusterExtension API.

Key points:

  • Deprecates the serviceAccount field in the ClusterExtension API
  • Emphasizes that only cluster-admins should have write access to the ClusterExtension API
  • Discusses RBAC implications and security considerations

Signed-off-by: Rashmi Gottipati <rgottipa@redhat.com>
Signed-off-by: Rashmi Gottipati <rgottipa@redhat.com>
@openshift-ci openshift-ci bot requested review from celebdor and sdodson October 9, 2025 19:37
@rashmigottipati rashmigottipati changed the title Deprecate serviceAccount field in OLMv1 ClusterExtension API OPRUN-4172: Deprecate serviceAccount field in OLMv1 ClusterExtension API Oct 9, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Oct 9, 2025

@rashmigottipati: This pull request references OPRUN-4172 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

In response to this:

This PR proposes the deprecation of the serviceAccount field in the OLMv1 ClusterExtension API.

Key points:

  • Deprecates the serviceAccount field in the ClusterExtension API
  • Emphasizes that only cluster-admins should have write access to the ClusterExtension API
  • Discusses RBAC implications and security considerations

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Oct 9, 2025
Signed-off-by: Rashmi Gottipati <rgottipa@redhat.com>
Signed-off-by: Rashmi Gottipati <rgottipa@redhat.com>
@rashmigottipati
Copy link
Member Author

cc @trgeiger @grokspawn

@everettraven
Copy link
Contributor

This looks like it makes changes to APIs on OpenShift but doesn't include an API reviewer in the set of approvers. Please reach out on the #forum-api-review channel in slack to request an API reviewer be assigned to review this enhancement.

Thanks!

@rashmigottipati
Copy link
Member Author

Thanks @everettraven. I’ve reached out in #forum-api-review on Slack to request an API reviewer for this enhancement. I’ll update the PR’s approvers list once a reviewer is assigned.


Deprecate the `.spec.serviceAccount` field from the ClusterExtension API in the operator-controller. This field was originally introduced to enforce least privilege by requiring users to provide a ServiceAccount with the necessary RBAC permissions to manage extension content. This proposal removes that requirement and simplifies controller logic and behavior.

**Note**: This deprecation is an interim step. While the intent is to eventually remove the `.spec.serviceAccount` field, we will keep it as optional and mark it as deprecated during this transition period spanning multiple releases. This approach allows users time to adapt without disruption before the field is fully removed.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the graduation plan is a good place to discuss this, and maybe I'm just imagining that we're going to end up having a lively discussion on the path forward and not wanting to have to update two places.
Maybe it's just me... it seems like a good callout aside from that. 😄

## Motivation

The original intent of `.spec.serviceAccount` was to support a least-privilege model by allowing users to provide custom `ServiceAccount` with fine-grained permissions. In practice, this design introduced considerable operational and technical complexity, including:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd understood that it also arose from the need to implement a rollout revision approach (using boxcutter) which required cluster admin permissions anyway, so we couldn't reasonably reduce the privileges of a "package manager" from cluster admin levels.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you specifically need to use boxcutter for a revision rollout approach?

What are the limitations in doing revision rollouts, regardless of underlying implementation, that makes it impossible for you to continue to adhere to the least-privilege principle?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The primary reason for changing this is that is doesn't deliver on our 'secure by default' promise. Any person with write access to the ClusterExtension API would be able to specify any ServiceAccount, which opens the door to privilege escalation.

See "Need" section in RFC: https://docs.google.com/document/d/1qqzPtAHB1gnv00mZB7U0h0Om9ELyPSZMqsIUUNlIZOY/
and EP: #1860

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Write access to the ClusterExtension API seems like it should be a cluster-admin level permission. If someone has assigned that permission to someone they do not trust, they have shot themselves in the foot.

Accessing the cluster and being able to write to the ClusterExtension API is a difficult task when configured correctly.

The way I interpret 'secure by default' in the context of OLM is that OLM will not make it easy to leverage what it does on the cluster as an attack vector (i.e automatic upgrades, installing anything specified in a bundle, etc.).


- Make `.spec.serviceAccount` field **optional** in the `ClusterExtension` API.
- Update the controller to **ignore** the field during reconciliation.
- Log a warning when `.spec.serviceAccount` is set.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to adopt an approach similar to how k8s tracks deprecation (see: https://kubernetes.io/blog/2020/09/03/warnings/#metrics). For example, we could implement a clusterextension_requested_deprecated_api metric, or an admission webhook to warn when we try to create a ClusterExtension with a serviceaccount.

// Deprecated: This field is ignored and will be removed in a future release.
// +optional
// +kubebuilder:validation:XValidation:message="serviceAccount is deprecated and ignored"
ServiceAccount *ClusterExtensionServiceAccount `json:"serviceAccount,omitempty"`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We considered moving this to a pointer type to be a potentially breaking change we could avoid, so we're using go1.24's omitzero capability to omit this field from the struct if it is set to only 'default' values: https://github.com/operator-framework/operator-controller/pull/2242/files#diff-387659018fbee6e27fcf5ab561b45489e14b62ae73ecf00fed3a0f4c017714d1R75

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beforehand, was an empty string ("") a valid value here?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. It would've failed the DNS1123 label check.

- The controller now uses a static `rest.Config` created from its own identity (its default ServiceAccount).
- The RestConfigMapper will now be directly set to `ClusterAdminRestConfigMapper(mgr.GetConfig())`, i.e. the controller always uses its own identity (cluster-admin level config) for all reconciliation and watching operations.
This config is passed to the helm.ActionConfigGetter, ensuring that all Helm operations (install, upgrade, uninstall) use the same identity.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to include the removal of the preflight permissions checks here, or is that not required because those are not GA'd?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should remove the preflight checks as well. We can do that any time after we give OLM cluster-admin and change the various clients to use its permissions instead of the CE's SA's permissions.


### Graduation Criteria / Deprecation Plan

We will deprecate the .spec.serviceAccount field over the course of three OpenShift releases following the kubernetes' deprecation policy:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's where we really need arch guidance. We've been informed that GA'd OCP APIs have functionally never been removed, even though nominally we could -- after a reasonable period -- up-version the API and remove the field, including a conversion webhook to handle upgrades.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% sure what our process here for an up-version of an API is in OpenShift, but if there is one, I imagine it wouldn't be too dissimilar from Kubernetes (https://kubernetes.io/docs/reference/using-api/deprecation-policy/#rest-resources-aka-api-objects).

You cannot remove the API version until it has "aged out" - in which case you are confident that all users have switched over to the newer, non-deprecated API. In the upstream case it looks like release N+9. You also must have a field in the next version of the API to map the old field value to handle round-tripping without data loss.

As far as I know, OpenShift has never removed a stable API version so I would default to not being able to do this.

@JoelSpeed may have more familiarity with whether or not this is something we might have a way to allow.

I know OLM is an upstream project that we sync the APIs downstream, and we generally aren't as restrictive there because we don't entirely control the API surface, but IIUC we consider OLM a core component of the OpenShift payload so I would expect that we end up treating it with a bit more caution.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are currently discussing the process for removing fields in openshift, but only at major version boundaries, I have more questions before I can provide a suggested path here, see my other comments

- The controller’s ServiceAccount needs broader permissions, potentially increasing risk if compromised.
- Cluster administrators must carefully manage the controller’s permissions.
- Possible security concerns when consolidating privileges into a single SA as opposed to separate, per-extension identities.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lost signal if an operator upgrade requires permissions not provided by the existing, configured serviceaccount (since OLMv1 can do anything, we might perpetrate the upgrade and then have the operator fail).

Signed-off-by: Rashmi Gottipati <rgottipa@redhat.com>
- Custom `rest.Config` generation and dynamic HTTP clients
- Token expiration edge cases

Given the limited benefit and high complexity of this approach, we propose simplifying the model:
Copy link
Contributor

@everettraven everettraven Oct 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be helpful if you could clarify what the "limited benefit" here is.

You start this section talking about how the intention was originally for following a least-privilege principle which, on the surface, sounds like it would actually be a pretty big benefit.

After all, looking at the documentation for OLMv1 it seems like "secure by default" is a core tenant tenet: https://github.com/operator-framework/operator-controller/blob/292c0dbc236747e57f77fd914830a865e87fad6b/docs/project/olmv1_design_decisions.md?plain=1#L179-L186

As a reviewer, I'd like to better understand:

  • Why this tenant tenet was originally seen as a win for end users
  • Why this tenant tenet turned out to not be a win for end users

## Motivation

The original intent of `.spec.serviceAccount` was to support a least-privilege model by allowing users to provide custom `ServiceAccount` with fine-grained permissions. In practice, this design introduced considerable operational and technical complexity, including:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you specifically need to use boxcutter for a revision rollout approach?

What are the limitations in doing revision rollouts, regardless of underlying implementation, that makes it impossible for you to continue to adhere to the least-privilege principle?

Comment on lines +39 to +41
- Token acquisition, rotation, and error handling
- Custom `rest.Config` generation and dynamic HTTP clients
- Token expiration edge cases
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you actively experiencing issues with being able to handle these scenarios?

From what I remember this logic is all already implemented and is flexed during testing. I'm curious how many times the logic to do these things have been problematic from both an end-user perspective and a maintenance perspective.

- Optional
- Deprecated via struct tags and documentation
- Marked as deprecated in the CRD schema using OpenAPI extensions:
- x-kubernetes-deprecated: true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this exists?

As far as I am aware, there is not yet a way to mark a specific field in a CRD schema as deprecated. https://pkg.go.dev/k8s.io/apiextensions-apiserver/pkg/apis/apiextensions#JSONSchemaProps would be your canonical source of things you can set for a particular schema property.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like maybe it hasn't landed yet: kubernetes/kubernetes#131817
So the only option is to comment that these are deprecated (in addition to programmatic capture of "a deprecated thing is being used").

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could include a validating admission policy that issues a warning on admission of ClusterExtension resources that have that field set.

// Deprecated: This field is ignored and will be removed in a future release.
// +optional
// +kubebuilder:validation:XValidation:message="serviceAccount is deprecated and ignored"
ServiceAccount *ClusterExtensionServiceAccount `json:"serviceAccount,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beforehand, was an empty string ("") a valid value here?


## Drawbacks
- Users lose fine-grained permission control per extension via separate ServiceAccounts.
- The controller’s ServiceAccount needs broader permissions, potentially increasing risk if compromised.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, but there are likely other ServiceAccounts on the cluster that already have broad enough permissions to do damage. To do that damage you have to first get access to the cluster.

This could certainly be considered a drawback to increase the permissions associated with the OLM Service Account, but I think a bigger drawback I don't see explicitly mentioned here is the potential for OLM to be the thing that injected a malicious actor into the cluster because it no longer will fail to stamp out permissions it wasn't explicitly told it could.

While it may still be difficult, we've seen in the past malicious actors sneak something into a popular project that is then shipped out en masse. With this change, OLM could be used to get that access to a cluster.


1. Keep `.spec.serviceAccount` and Improve Token Handling
Instead of removing the field, we could improve token management (like automatic refresh and better error handling).
- However, this would add more complexity and maintenance work without fixing the main issue: impersonation is complicated and rarely helpful.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading through this, I'm not convinced on the "rarely helpful" part. Please do update this document appropriately to explicitly state why this has been found to be rarely helpful.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

In a world where OLM has super user powers, it can install and change anything on the cluster, that's somewhat expected given the role it plays.

But, what if I, as an end user decide, ok, I'm installing this bundle, but to avoid potential accidents, I want OLM to install this with a more restrictive set of permissions?

It seems removing this field removes that ability for end users, I don't see why that isn't beneficial?

- Cluster administrators must carefully manage the controller’s permissions.
- Possible security concerns when consolidating privileges into a single SA as opposed to separate, per-extension identities.

## Alternatives (Not Implemented)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some alternatives I've not seen mentioned as being considered. Curious if you've thought of them:

  • Building better client tooling that makes managing permission requirements easier for users looking to install cluster extensions.
  • Building a permission requesting system that makes it easier for users to request installation of a cluster extension and OLM creating a "permission request" that an admin can either approve or deny to automatically stamp out the RBAC needed to complete the installation.

- This would make the API more complex, go against common Kubernetes practices, and could confuse users.

3. Do Nothing – Keep Supporting the Field as Is
- The added complexity and risks from impersonation outweigh the benefits, so leaving it unchanged is not a good option.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading through this, I still don't have a solid understanding of how painful this added complexity and risks of impersonation actually are

- Remove all logic that:
- Token Acquisition: Eliminate use of TokenRequest API to fetch short-lived tokens for user-provided ServiceAccounts.
- Rest Config Mapping: Remove the ServiceAccountRestConfigMapper, which dynamically generated rest.Config objects for impersonation.
- Synthetic Permissions: Remove conditional logic for SyntheticPermissions that depended on impersonated clients.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do "Synthetic Permissions" play a role in service account based permissions? What are "Synthetic Permissions"?

Copy link
Contributor

@JoelSpeed JoelSpeed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure from a product perspective we can actually proceed with this deprecation/removal. It appears currently like a backwards step in terms of security and I don't think will be welcomed by our users.

I haven't read this too deeply yet, but, I'm confused about the technical issues you are facing with the service accounts? How are you impersonating service accounts today?

Generally the model is that you have an account (super user) that has permission to impersonate another account, this is not an escalation or privileges, but a de-escalation. But given you're talking about having to manage tokens and rotations, it doesn't sound like that's what is happening, perhaps you're trying to get the service account tokens directly and caching those?


OpenShift 4.21: (next release - N)
- Mark the field as optional in the API.
- The controller ignores the field entirely.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this change the behaviour for an end user? If they want to leverage the older behaviour still, are they able to do so? Is there a replacement field?

OpenShift 4.21: (next release - N)
- Mark the field as optional in the API.
- The controller ignores the field entirely.
- Log a warning if it is set, to alert users.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logs aren't visible to users, use a VAP set to Warn


### Graduation Criteria / Deprecation Plan

We will deprecate the .spec.serviceAccount field over the course of three OpenShift releases following the kubernetes' deprecation policy:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are currently discussing the process for removing fields in openshift, but only at major version boundaries, I have more questions before I can provide a suggested path here, see my other comments

Some users currently set `.spec.serviceAccount` assuming the controller will impersonate that ServiceAccount during reconciliation. Changing this behavior without notice could break their expectations around RBAC scopes and permissions, for example: restricting access to specific namespaces or resources.

#### Mitigation:
The controller will log a clear deprecation warning whenever a ClusterExtension includes the serviceAccount field, indicating it is now ignored and will be removed in a future release. Additionally, the field will be marked as deprecated in the CRD schema and documentation to make this clear during resource creation and review. Migration instructions will be provided in the release notes to assist users with updating their configurations.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logs are not visible.

Once it's ignored, what else has the potential to break? Does OLM now just have super user powers?

- https://github.com/operator-framework/operator-controller/pull/2242

## Drawbacks
- Users lose fine-grained permission control per extension via separate ServiceAccounts.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this something we are ok with?


1. Keep `.spec.serviceAccount` and Improve Token Handling
Instead of removing the field, we could improve token management (like automatic refresh and better error handling).
- However, this would add more complexity and maintenance work without fixing the main issue: impersonation is complicated and rarely helpful.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

In a world where OLM has super user powers, it can install and change anything on the cluster, that's somewhat expected given the role it plays.

But, what if I, as an end user decide, ok, I'm installing this bundle, but to avoid potential accidents, I want OLM to install this with a more restrictive set of permissions?

It seems removing this field removes that ability for end users, I don't see why that isn't beneficial?

Signed-off-by: Rashmi Gottipati <rgottipa@redhat.com>
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 20, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign stbenjam for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 20, 2025

@rashmigottipati: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/markdownlint eb5eee9 link true /test markdownlint

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@rashmigottipati
Copy link
Member Author

rashmigottipati commented Oct 22, 2025

Closing this enhancement proposal as the plan now is to address this in phases outlined below:

Phase 0: 4.21 Prep / Deprecation

  • Deprecate spec.serviceAccount field and turn it into an optional field. If set by the user, it should function exactly how it works currently
  • Helm-applier and operator workflows fallback to the operator console’s cluster-admin ServiceAccount if spec.serviceAccount is not specified
  • Update documentation to remove references to specifying the ServiceAccount and note the deprecation
  • Guidance: users should rely on op-con’s cluster-admin capabilities until newer approaches are available

Phase 1: Clean-up/Preflight Removal

  • Remove permissions preflight checks
  • Remove synthetic authentication

Phase 2 : Impersonation

  • Switch to using the impersonation API instead of ServiceAccount tokens
  • Reduce reliance on cluster-admin while maintaining multi-namespace access via ClusterRoleBindings

Phase 3 – Future/Boxcutter Integration

  • Revisit the CE ServiceAccount handling after Boxcutter and Helm support is integrated.

Many thanks to the API reviewers @everettraven and @JoelSpeed for your valuable feedback and thoughtful recommendations in the API review call and through your review comments on this EP and the RFC. Your input helped clarify the phased direction and strengthen the overall plan.

@JoelSpeed
Copy link
Contributor

@rashmigottipati As far as I can tell, we should still be following the EP process for this change. Especially given your phase 0 seems to be exactly the same as what the proposal here was suggesting and that is something we have already been discussing as a no go

@grokspawn
Copy link

I think we're substantially agreeing on intent, and it's just the terminology that's an impediment.

Deprecate spec.serviceAccount field and turn it into an optional field. If set by the user, it should function exactly how it works currently

in the first bullet uses deprecate in the dictionary sense as "to mark obsolete and warn against future use", not "to remove". The field will remain functional, but we'll attempt to steer users away from it.
And evidently that's all we can do for OCP major+2.

If all that remains is changing the API field from required to optional, do we need to have an EP?

@JoelSpeed
Copy link
Contributor

In the call on tuesday, we talked about how we will have users who want this level of granularity. I was under the impression at the end that deprecating was not the intent, but allowing users three paths (not specifying at all, specifying but use a namespace bound admin (with some cluster roles), go the full secure least privilege route) was where we were headed? For that, we need to fix the existing issues that exist with the field, namely privilege escalation and UX

I've suggested in the RFC a relatively straightforward and secure approach to solving the privilege escalation issue.

The other issue is the UX, which we still need to discuss how to improve, I was prompting in the RFC for further details on this so that we could continue the conversation.

in the first bullet uses deprecate in the dictionary sense as "to mark obsolete and warn against future use", not "to remove".

I don't think we can do this until we have something new to point users to that replaces this. How far off is that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants