-
Couldn't load subscription status.
- Fork 522
OPRUN-4172: Deprecate serviceAccount field in OLMv1 ClusterExtension API #1860
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OPRUN-4172: Deprecate serviceAccount field in OLMv1 ClusterExtension API #1860
Conversation
Signed-off-by: Rashmi Gottipati <rgottipa@redhat.com>
Signed-off-by: Rashmi Gottipati <rgottipa@redhat.com>
|
@rashmigottipati: This pull request references OPRUN-4172 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Signed-off-by: Rashmi Gottipati <rgottipa@redhat.com>
Signed-off-by: Rashmi Gottipati <rgottipa@redhat.com>
|
This looks like it makes changes to APIs on OpenShift but doesn't include an API reviewer in the set of approvers. Please reach out on the #forum-api-review channel in slack to request an API reviewer be assigned to review this enhancement. Thanks! |
|
Thanks @everettraven. I’ve reached out in #forum-api-review on Slack to request an API reviewer for this enhancement. I’ll update the PR’s approvers list once a reviewer is assigned. |
|
|
||
| Deprecate the `.spec.serviceAccount` field from the ClusterExtension API in the operator-controller. This field was originally introduced to enforce least privilege by requiring users to provide a ServiceAccount with the necessary RBAC permissions to manage extension content. This proposal removes that requirement and simplifies controller logic and behavior. | ||
|
|
||
| **Note**: This deprecation is an interim step. While the intent is to eventually remove the `.spec.serviceAccount` field, we will keep it as optional and mark it as deprecated during this transition period spanning multiple releases. This approach allows users time to adapt without disruption before the field is fully removed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the graduation plan is a good place to discuss this, and maybe I'm just imagining that we're going to end up having a lively discussion on the path forward and not wanting to have to update two places.
Maybe it's just me... it seems like a good callout aside from that. 😄
| ## Motivation | ||
|
|
||
| The original intent of `.spec.serviceAccount` was to support a least-privilege model by allowing users to provide custom `ServiceAccount` with fine-grained permissions. In practice, this design introduced considerable operational and technical complexity, including: | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd understood that it also arose from the need to implement a rollout revision approach (using boxcutter) which required cluster admin permissions anyway, so we couldn't reasonably reduce the privileges of a "package manager" from cluster admin levels.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you specifically need to use boxcutter for a revision rollout approach?
What are the limitations in doing revision rollouts, regardless of underlying implementation, that makes it impossible for you to continue to adhere to the least-privilege principle?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The primary reason for changing this is that is doesn't deliver on our 'secure by default' promise. Any person with write access to the ClusterExtension API would be able to specify any ServiceAccount, which opens the door to privilege escalation.
See "Need" section in RFC: https://docs.google.com/document/d/1qqzPtAHB1gnv00mZB7U0h0Om9ELyPSZMqsIUUNlIZOY/
and EP: #1860
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Write access to the ClusterExtension API seems like it should be a cluster-admin level permission. If someone has assigned that permission to someone they do not trust, they have shot themselves in the foot.
Accessing the cluster and being able to write to the ClusterExtension API is a difficult task when configured correctly.
The way I interpret 'secure by default' in the context of OLM is that OLM will not make it easy to leverage what it does on the cluster as an attack vector (i.e automatic upgrades, installing anything specified in a bundle, etc.).
|
|
||
| - Make `.spec.serviceAccount` field **optional** in the `ClusterExtension` API. | ||
| - Update the controller to **ignore** the field during reconciliation. | ||
| - Log a warning when `.spec.serviceAccount` is set. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to adopt an approach similar to how k8s tracks deprecation (see: https://kubernetes.io/blog/2020/09/03/warnings/#metrics). For example, we could implement a clusterextension_requested_deprecated_api metric, or an admission webhook to warn when we try to create a ClusterExtension with a serviceaccount.
| // Deprecated: This field is ignored and will be removed in a future release. | ||
| // +optional | ||
| // +kubebuilder:validation:XValidation:message="serviceAccount is deprecated and ignored" | ||
| ServiceAccount *ClusterExtensionServiceAccount `json:"serviceAccount,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We considered moving this to a pointer type to be a potentially breaking change we could avoid, so we're using go1.24's omitzero capability to omit this field from the struct if it is set to only 'default' values: https://github.com/operator-framework/operator-controller/pull/2242/files#diff-387659018fbee6e27fcf5ab561b45489e14b62ae73ecf00fed3a0f4c017714d1R75
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Beforehand, was an empty string ("") a valid value here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. It would've failed the DNS1123 label check.
| - The controller now uses a static `rest.Config` created from its own identity (its default ServiceAccount). | ||
| - The RestConfigMapper will now be directly set to `ClusterAdminRestConfigMapper(mgr.GetConfig())`, i.e. the controller always uses its own identity (cluster-admin level config) for all reconciliation and watching operations. | ||
| This config is passed to the helm.ActionConfigGetter, ensuring that all Helm operations (install, upgrade, uninstall) use the same identity. | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to include the removal of the preflight permissions checks here, or is that not required because those are not GA'd?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should remove the preflight checks as well. We can do that any time after we give OLM cluster-admin and change the various clients to use its permissions instead of the CE's SA's permissions.
|
|
||
| ### Graduation Criteria / Deprecation Plan | ||
|
|
||
| We will deprecate the .spec.serviceAccount field over the course of three OpenShift releases following the kubernetes' deprecation policy: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's where we really need arch guidance. We've been informed that GA'd OCP APIs have functionally never been removed, even though nominally we could -- after a reasonable period -- up-version the API and remove the field, including a conversion webhook to handle upgrades.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not 100% sure what our process here for an up-version of an API is in OpenShift, but if there is one, I imagine it wouldn't be too dissimilar from Kubernetes (https://kubernetes.io/docs/reference/using-api/deprecation-policy/#rest-resources-aka-api-objects).
You cannot remove the API version until it has "aged out" - in which case you are confident that all users have switched over to the newer, non-deprecated API. In the upstream case it looks like release N+9. You also must have a field in the next version of the API to map the old field value to handle round-tripping without data loss.
As far as I know, OpenShift has never removed a stable API version so I would default to not being able to do this.
@JoelSpeed may have more familiarity with whether or not this is something we might have a way to allow.
I know OLM is an upstream project that we sync the APIs downstream, and we generally aren't as restrictive there because we don't entirely control the API surface, but IIUC we consider OLM a core component of the OpenShift payload so I would expect that we end up treating it with a bit more caution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are currently discussing the process for removing fields in openshift, but only at major version boundaries, I have more questions before I can provide a suggested path here, see my other comments
| - The controller’s ServiceAccount needs broader permissions, potentially increasing risk if compromised. | ||
| - Cluster administrators must carefully manage the controller’s permissions. | ||
| - Possible security concerns when consolidating privileges into a single SA as opposed to separate, per-extension identities. | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lost signal if an operator upgrade requires permissions not provided by the existing, configured serviceaccount (since OLMv1 can do anything, we might perpetrate the upgrade and then have the operator fail).
Signed-off-by: Rashmi Gottipati <rgottipa@redhat.com>
| - Custom `rest.Config` generation and dynamic HTTP clients | ||
| - Token expiration edge cases | ||
|
|
||
| Given the limited benefit and high complexity of this approach, we propose simplifying the model: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be helpful if you could clarify what the "limited benefit" here is.
You start this section talking about how the intention was originally for following a least-privilege principle which, on the surface, sounds like it would actually be a pretty big benefit.
After all, looking at the documentation for OLMv1 it seems like "secure by default" is a core tenant tenet: https://github.com/operator-framework/operator-controller/blob/292c0dbc236747e57f77fd914830a865e87fad6b/docs/project/olmv1_design_decisions.md?plain=1#L179-L186
As a reviewer, I'd like to better understand:
- Why this
tenanttenet was originally seen as a win for end users - Why this
tenanttenet turned out to not be a win for end users
| ## Motivation | ||
|
|
||
| The original intent of `.spec.serviceAccount` was to support a least-privilege model by allowing users to provide custom `ServiceAccount` with fine-grained permissions. In practice, this design introduced considerable operational and technical complexity, including: | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you specifically need to use boxcutter for a revision rollout approach?
What are the limitations in doing revision rollouts, regardless of underlying implementation, that makes it impossible for you to continue to adhere to the least-privilege principle?
| - Token acquisition, rotation, and error handling | ||
| - Custom `rest.Config` generation and dynamic HTTP clients | ||
| - Token expiration edge cases |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you actively experiencing issues with being able to handle these scenarios?
From what I remember this logic is all already implemented and is flexed during testing. I'm curious how many times the logic to do these things have been problematic from both an end-user perspective and a maintenance perspective.
| - Optional | ||
| - Deprecated via struct tags and documentation | ||
| - Marked as deprecated in the CRD schema using OpenAPI extensions: | ||
| - x-kubernetes-deprecated: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this exists?
As far as I am aware, there is not yet a way to mark a specific field in a CRD schema as deprecated. https://pkg.go.dev/k8s.io/apiextensions-apiserver/pkg/apis/apiextensions#JSONSchemaProps would be your canonical source of things you can set for a particular schema property.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like maybe it hasn't landed yet: kubernetes/kubernetes#131817
So the only option is to comment that these are deprecated (in addition to programmatic capture of "a deprecated thing is being used").
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could include a validating admission policy that issues a warning on admission of ClusterExtension resources that have that field set.
| // Deprecated: This field is ignored and will be removed in a future release. | ||
| // +optional | ||
| // +kubebuilder:validation:XValidation:message="serviceAccount is deprecated and ignored" | ||
| ServiceAccount *ClusterExtensionServiceAccount `json:"serviceAccount,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Beforehand, was an empty string ("") a valid value here?
|
|
||
| ## Drawbacks | ||
| - Users lose fine-grained permission control per extension via separate ServiceAccounts. | ||
| - The controller’s ServiceAccount needs broader permissions, potentially increasing risk if compromised. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, but there are likely other ServiceAccounts on the cluster that already have broad enough permissions to do damage. To do that damage you have to first get access to the cluster.
This could certainly be considered a drawback to increase the permissions associated with the OLM Service Account, but I think a bigger drawback I don't see explicitly mentioned here is the potential for OLM to be the thing that injected a malicious actor into the cluster because it no longer will fail to stamp out permissions it wasn't explicitly told it could.
While it may still be difficult, we've seen in the past malicious actors sneak something into a popular project that is then shipped out en masse. With this change, OLM could be used to get that access to a cluster.
|
|
||
| 1. Keep `.spec.serviceAccount` and Improve Token Handling | ||
| Instead of removing the field, we could improve token management (like automatic refresh and better error handling). | ||
| - However, this would add more complexity and maintenance work without fixing the main issue: impersonation is complicated and rarely helpful. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading through this, I'm not convinced on the "rarely helpful" part. Please do update this document appropriately to explicitly state why this has been found to be rarely helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed.
In a world where OLM has super user powers, it can install and change anything on the cluster, that's somewhat expected given the role it plays.
But, what if I, as an end user decide, ok, I'm installing this bundle, but to avoid potential accidents, I want OLM to install this with a more restrictive set of permissions?
It seems removing this field removes that ability for end users, I don't see why that isn't beneficial?
| - Cluster administrators must carefully manage the controller’s permissions. | ||
| - Possible security concerns when consolidating privileges into a single SA as opposed to separate, per-extension identities. | ||
|
|
||
| ## Alternatives (Not Implemented) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some alternatives I've not seen mentioned as being considered. Curious if you've thought of them:
- Building better client tooling that makes managing permission requirements easier for users looking to install cluster extensions.
- Building a permission requesting system that makes it easier for users to request installation of a cluster extension and OLM creating a "permission request" that an admin can either approve or deny to automatically stamp out the RBAC needed to complete the installation.
| - This would make the API more complex, go against common Kubernetes practices, and could confuse users. | ||
|
|
||
| 3. Do Nothing – Keep Supporting the Field as Is | ||
| - The added complexity and risks from impersonation outweigh the benefits, so leaving it unchanged is not a good option. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading through this, I still don't have a solid understanding of how painful this added complexity and risks of impersonation actually are
| - Remove all logic that: | ||
| - Token Acquisition: Eliminate use of TokenRequest API to fetch short-lived tokens for user-provided ServiceAccounts. | ||
| - Rest Config Mapping: Remove the ServiceAccountRestConfigMapper, which dynamically generated rest.Config objects for impersonation. | ||
| - Synthetic Permissions: Remove conditional logic for SyntheticPermissions that depended on impersonated clients. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do "Synthetic Permissions" play a role in service account based permissions? What are "Synthetic Permissions"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure from a product perspective we can actually proceed with this deprecation/removal. It appears currently like a backwards step in terms of security and I don't think will be welcomed by our users.
I haven't read this too deeply yet, but, I'm confused about the technical issues you are facing with the service accounts? How are you impersonating service accounts today?
Generally the model is that you have an account (super user) that has permission to impersonate another account, this is not an escalation or privileges, but a de-escalation. But given you're talking about having to manage tokens and rotations, it doesn't sound like that's what is happening, perhaps you're trying to get the service account tokens directly and caching those?
|
|
||
| OpenShift 4.21: (next release - N) | ||
| - Mark the field as optional in the API. | ||
| - The controller ignores the field entirely. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this change the behaviour for an end user? If they want to leverage the older behaviour still, are they able to do so? Is there a replacement field?
| OpenShift 4.21: (next release - N) | ||
| - Mark the field as optional in the API. | ||
| - The controller ignores the field entirely. | ||
| - Log a warning if it is set, to alert users. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logs aren't visible to users, use a VAP set to Warn
|
|
||
| ### Graduation Criteria / Deprecation Plan | ||
|
|
||
| We will deprecate the .spec.serviceAccount field over the course of three OpenShift releases following the kubernetes' deprecation policy: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are currently discussing the process for removing fields in openshift, but only at major version boundaries, I have more questions before I can provide a suggested path here, see my other comments
| Some users currently set `.spec.serviceAccount` assuming the controller will impersonate that ServiceAccount during reconciliation. Changing this behavior without notice could break their expectations around RBAC scopes and permissions, for example: restricting access to specific namespaces or resources. | ||
|
|
||
| #### Mitigation: | ||
| The controller will log a clear deprecation warning whenever a ClusterExtension includes the serviceAccount field, indicating it is now ignored and will be removed in a future release. Additionally, the field will be marked as deprecated in the CRD schema and documentation to make this clear during resource creation and review. Migration instructions will be provided in the release notes to assist users with updating their configurations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logs are not visible.
Once it's ignored, what else has the potential to break? Does OLM now just have super user powers?
| - https://github.com/operator-framework/operator-controller/pull/2242 | ||
|
|
||
| ## Drawbacks | ||
| - Users lose fine-grained permission control per extension via separate ServiceAccounts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this something we are ok with?
|
|
||
| 1. Keep `.spec.serviceAccount` and Improve Token Handling | ||
| Instead of removing the field, we could improve token management (like automatic refresh and better error handling). | ||
| - However, this would add more complexity and maintenance work without fixing the main issue: impersonation is complicated and rarely helpful. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed.
In a world where OLM has super user powers, it can install and change anything on the cluster, that's somewhat expected given the role it plays.
But, what if I, as an end user decide, ok, I'm installing this bundle, but to avoid potential accidents, I want OLM to install this with a more restrictive set of permissions?
It seems removing this field removes that ability for end users, I don't see why that isn't beneficial?
Signed-off-by: Rashmi Gottipati <rgottipa@redhat.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@rashmigottipati: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
Closing this enhancement proposal as the plan now is to address this in phases outlined below: Phase 0: 4.21 Prep / Deprecation
Phase 1: Clean-up/Preflight Removal
Phase 2 : Impersonation
Phase 3 – Future/Boxcutter Integration
Many thanks to the API reviewers @everettraven and @JoelSpeed for your valuable feedback and thoughtful recommendations in the API review call and through your review comments on this EP and the RFC. Your input helped clarify the phased direction and strengthen the overall plan. |
|
@rashmigottipati As far as I can tell, we should still be following the EP process for this change. Especially given your phase 0 seems to be exactly the same as what the proposal here was suggesting and that is something we have already been discussing as a no go |
|
I think we're substantially agreeing on intent, and it's just the terminology that's an impediment.
in the first bullet uses deprecate in the dictionary sense as "to mark obsolete and warn against future use", not "to remove". The field will remain functional, but we'll attempt to steer users away from it. If all that remains is changing the API field from required to optional, do we need to have an EP? |
|
In the call on tuesday, we talked about how we will have users who want this level of granularity. I was under the impression at the end that deprecating was not the intent, but allowing users three paths (not specifying at all, specifying but use a namespace bound admin (with some cluster roles), go the full secure least privilege route) was where we were headed? For that, we need to fix the existing issues that exist with the field, namely privilege escalation and UX I've suggested in the RFC a relatively straightforward and secure approach to solving the privilege escalation issue. The other issue is the UX, which we still need to discuss how to improve, I was prompting in the RFC for further details on this so that we could continue the conversation.
I don't think we can do this until we have something new to point users to that replaces this. How far off is that? |
This PR proposes the deprecation of the
serviceAccountfield in the OLMv1ClusterExtensionAPI.Key points:
serviceAccountfield in theClusterExtensionAPIClusterExtensionAPI