Subscription Injection Operator#389
Conversation
jamescassell
left a comment
There was a problem hiding this comment.
Seems like way overkill. Why not just have such an operator simply subscribe the hosts? Then librhsm inside the container creates the redhat.repo on demand, based on the certificate and the Container's corresponding RHEL version, without further complication.
Either run a periodic one-off job to ensure there's an up-to-date subscription, and/or run a DaemonSet with rhsmcertd to keep the certificates continually up-to-date.
| As a developer debugging my application I want access to RHEL subscription content in my containers | ||
| So that I can install debugging tools from Red Hat via `yum install` |
There was a problem hiding this comment.
Subscribing the host solves this.
There was a problem hiding this comment.
my understand is there is no such thing as subscribing the host for rhcos nodes.
There was a problem hiding this comment.
No reason not to. Just hasn't been "made a thing" and that's what this operator should do unless there's an easy way without an operator at all.
There was a problem hiding this comment.
(I even subscribe my Fedora laptop so I can run RHEL containers.)
There was a problem hiding this comment.
No reason not to.
the reason not to is administrative overhead of managing subscriptions on those machines (and assigning subscriptions to those machines). I'm not sufficiently versed in the business side of things, but i think it's been a deliberate choice not to make admins manage per machine subscriptions for openshift clusters.
i'll let @derekwaynecarr or @eparis weigh in here, but i think that approach was discarded pretty early on in the "how do we enable entitlements for pods/builds in OCP4" discussion
the goal here is to have a single subscription for the cluster which all workloads can leverage as needed.
There was a problem hiding this comment.
Then just distribute that single credential across the cluster in the well-known locations on the hosts so that cri-o will pick it up. That's even easier!
There was a problem hiding this comment.
doing this at the crio level means end users have no choice about whether or not content is injected into their pods and admins have no control over who is allowed to access those creds (anyone who can launch a pod can do it).
among other things, it makes any sort of hosted multitenant cluster untenable.
There was a problem hiding this comment.
That's correct, the idea of subscribing individual hosts has been rejected. Host subscriptions bring no value to the host. Hosts are quite ephemeral in the changing IT landscape and managing such subscriptions does not make sense.
There was a problem hiding this comment.
Note too that in #384 I have added multi-cluster administration as a user story. Distributing entitlement credentials needs to scale to potentially thousands of clusters, each with as few as 1 node or as many as 10,000.
| Today, any UBI/RHEL-based container can access subscription content if the `redhat.repo` file is | ||
| mounted into `/run/secrets`, and the entitlement keys are mounted into |
There was a problem hiding this comment.
Nothing requires the redhat.repo file in /run/secrets today. Only the entitlement keys are required. The auto-mounted rhsm config is also honored inside the container, in case of a Satellite server instead of RH CDN.
| - Cluster admins need to manually obtain the entitlement keys and `redhat.repo` configurations. | ||
|
|
||
| ## Alternatives | ||
|
|
There was a problem hiding this comment.
| ### Create a privileged DaemonSet that manages the host subscriptions | |
| Such a DaemonSet wouldn't need to be fully privileged; only it would need | |
| access to `/etc/pki/{entitlement,consumer}` and `/etc/rhsm` mounts on the host. | |
| This would have the benefit of "it just works" without any further | |
| configuration from the end user to use the entitlements on RHEL 6, 7, or 8 | |
| containers. Likewise for UBI 7 and 8 containers. |
There was a problem hiding this comment.
I'll add this as an alternative in #384.
There was a problem hiding this comment.
With this alternative, how would we handle RHCOS nodes which do not have rhsm installed? What would the dameonset need to do when the node is drained and removed from the cluster? Likewise when the node joins the cluster via an auto-scaling mechanism.
|
i think i understand why you separated them, but i'd be inclined to fold this EP in with #384 so you have a single EP + component that handles/describes:
|
gabemontero
left a comment
There was a problem hiding this comment.
a few items after my initial pass
| caBundle provided to the webook configuration in 4. | ||
|
|
||
| [1] | ||
| https://kubernetes.io/blog/2019/03/21/a-guide-to-kubernetes-admission-controllers/#mutating-webhook-configuration |
There was a problem hiding this comment.
Use of https://github.com/openshift/generic-admission-server should be considered ... either as the primary choice, or listed in the alternatives section as something that was dismissed
There was a problem hiding this comment.
Interesting - will add as an implementation path.
There was a problem hiding this comment.
@gabemontero so I took a look at that - it's a deployment option, but I don't see it being adopted by other OpenShift admission webhooks.
Examples of active mutating admission webhooks:
There was a problem hiding this comment.
yeah @deads2k would provide a better sales job than I could (assuming it should still be sold) .... I don't recall if https://github.com/openshift/kubernetes-namespace-reservation is used directly or if it served as the basis for equivalent function elsewhere
bfd5e96 to
ebd17a5
Compare
|
@bparees this is ready for final review. commits have been squashed. |
| 2. Should we restrict entitlement info to live in `openshift-config`? Should we allow entitlements | ||
| and yum repo info to live in any namespace? | ||
| 3. Will this work for multi-tenant clusters? Should we consider APIs that are namespace-scoped? | ||
| 4. Should we inject a subscription bundle into every pod (and all containers within the pod) by |
There was a problem hiding this comment.
i think these questions need at least a placeholder answer before we can merge this and expect it to be implementable.
- allowing multiple will likely save you pain down the road. it's not hard to imagine multiple yumRepositories being needed
- restricting to openshift-config seems reasonable until you get to multitenant. probably better to support arbitrary namespaces up front.
- see above, doesn't seem like it works for multitenant clusters (at least not in a way where the tenants can control their own content. It would have to all be managed by a single admin who could access everyone's stuff)
- my inclination is no we should not inject by default, because if you do it by default, how does someone opt-out of it? But it might be reasonable for the subscription bundle to have a field that says "injectAlways" or something, so that an admin could make it so all pods get it, if the admin so chooses.
There was a problem hiding this comment.
allowing multiple will likely save you pain down the road. it's not hard to imagine multiple yumRepositories being needed
Agree - will update the API.
restricting to openshift-config seems reasonable until you get to multitenant. probably better to support arbitrary namespaces up front.
Agree as well - will update the ClusterBundle to allow Secrets/ConfigMap refs in any namespace.
see above, doesn't seem like it works for multitenant clusters (at least not in a way where the tenants can control their own content. It would have to all be managed by a single admin who could access everyone's stuff)
Ack on this, too. Will add the Bundle API which in all likelihood can be released first.
But it might be reasonable for the subscription bundle to have a field that says "injectAlways" or something, so that an admin could make it so all pods get it, if the admin so chooses
IMO this is a configuration option that is applied to the webhook, and should be exposed via an operator API. I don't think we need this right away, but we could certainly consider an enhancement.
My understanding of OLM operators is that we will need a CRD to say "install this webhook", so we can add this option as a future enhancement.
There was a problem hiding this comment.
making it a config on the webhook would imply it's set for all bundles or none of them? vs on a per bundle basis?
or maybe i'm not familiar enough w/ the webhook config/operator api
There was a problem hiding this comment.
My thought on injecting a cluster bundle everywhere:
- We can inject everywhere via the "webhook" CRD to say "inject this cluster bundle into all pods." Configuration is gated by whomever can edit the webhook's CRD. We can use the API to say that you are only allowed to inject one bundle everywhere.
- We can use a label or API field on
ClusterBundleto let admins declare "inject this cluster bundle everywhere". For this to work we would need to allow multiple bundles to be injected. If we are able to support multiple entitlement keys and yum repo files, supporting multiple bundles isn't that much of a stretch imo.
My understanding is that you'd prefer something like option 2?
There was a problem hiding this comment.
correct, though i understand injecting multiple bundles is likely problematic since it results in potentially multiple mounts to the same path of a given pod.
but i really don't like the idea that this would be a config on the webhook. I'd prefer the webhook not require separate configuration (or any configuration)
There was a problem hiding this comment.
so i think the question is, given we also said elsewhere that we want to allow multiple bundles, how do we actually inject multiple bundles into a pod? is there actually a reasonable way to do it? if not we might be back to saying one bundle only per pod.
at which point it becomes a question of how to enforce that when we also have the ability for a given bundle to say "always inject me"
There was a problem hiding this comment.
Thoughts on how to inject multiple bundles:
- Allow a comma-separated list of bundles:
subscription.openshift.io/inject-bundle: rhel,jboss - Entitlements cannot have overlapping keys in their
Secrets- if keys overlap deny admission - The optional yum repos likewise cannot have overlapping keys in the
ConfigMaps- if keys overlap deny admission
We would then use key-to-path mappings to ensure the keys and yum repo files are mounted correctly. For the namespace-scoped Bundle object, we can use a projected volume [1]. I believe rules 2 and 3 would also need to apply to a single Bundle which references multiple entitlements - we would need a sane way of merging these sets.
[1] https://kubernetes.io/docs/concepts/storage/volumes/#projected
There was a problem hiding this comment.
sounds reasonable to me. i'd be comfortable only supporting a single bundle in the first pass since the api can easily be expanded to allow comma separated lists in the future.
| cluster-scoped: | ||
|
|
||
| ```yaml | ||
| kind: ClusterBundle |
There was a problem hiding this comment.
why is this a clusterbundle and not a subscriptionbundle? trying to avoid semi-stuttering?
There was a problem hiding this comment.
(could we just call it a bundle? bundle.subscription.openshift.io probably provides sufficient context?)
There was a problem hiding this comment.
Multi-tenancy was a concern when I renamed the API to ClusterBundle. Adding a Bundle API that is namespace scoped is something I can add to the proposal, and is easier to implement (won't need the projected resource csi driver, we can use plain old Secret/ConfigMap volume mounts).
There was a problem hiding this comment.
hmmmm. i mean we can't drop the "i put the creds in one space and everyone in my cluster gets them" functionality.
in my head multitenant/multinamespace means:
"I can put the bundle in a namespace i control, and expose it to the users i choose. Those users can use it in any namespace they can create pods in"
There was a problem hiding this comment.
Single namespace is straightforward - I create a bundle in my namespace, with secrets/ConfigMaps in the same namespace.
Sharing across multiple namespaces implies we use a ClusterBundle, but then admins use RBAC to limit who gets what. Ex: ClusterBundle creates a ClusterRole, admins then add cluster roles to users for the desired namespaces.
| of the following: | ||
|
|
||
| 1. Verify that the referenced `entitlements` and optional `yumRepositories` resources exist. | ||
| 2. Create the ProjectedResource CSI driver instances for the `entitlements` and (optional) |
There was a problem hiding this comment.
what's the behavior if a ClusterBundle is later updated to point to a different entitlement secret?
what happens to the existing ProjectedResources? (also didn't we rename them Shares?) And what happens to the pods that are already consuming those ProjectedResources?
There was a problem hiding this comment.
@gabemontero you may be able to weigh in on what the CSI driver would do. IMO the ClusterBundle should wire the change through to the respective Share object. That in turn takes care of updating the pods/containers mounting in the shared resources.
There was a problem hiding this comment.
I agree @adambkaplan
And we update / delete as needed each pod's tmpfs copy of a share's referenced secrets/configmaps as needed
Ultimately, that will occur if either the secret/configmap or share changes.
| secret and one yum.repo configuration. | ||
| 4. `ClusterBundle` supporting multiple multiple `entitlements` and `yumRepositories`. | ||
| 5. Allow multiple `ClusterBundles` to be injected into a workload. | ||
| 6. Allow auto-injection of `ClusterBundles` into all workloads. |
There was a problem hiding this comment.
any intent to make Bundles be autoinjectable also? (within their namespace of course)
There was a problem hiding this comment.
Yeah...we should probably have that feature, too.
| ### Open Questions | ||
|
|
||
| 1. Is allowing an admission webhook which can read all `Secrets` and `ConfigMaps` acceptable? | ||
| 2. Do we need to perform a subject access review if a service account needs to read a `Bundle` |
There was a problem hiding this comment.
so if the SA can't read the Bundle, the "escalation" here is effectively that someone can create a pod that references the bundle (even though they can't read it), which, by virtue of the mutating webhook, exposes the name of the secrets the bundle references(as mounts on the pod). (The SA may or may not actually be able to read those secrets, so the pod may or may not ultimately be admitted).
it also means the user can probe at the names of Bundles which may exist in the namespace.
Overall this doesn't seem like an area of concern:
- it's restricted within a namespace
- it's users who can already create pods in the namespace, which means they are pretty privileged
Assuming i've defined the boundaries of the exploitability/escalation correctly, anyway.
|
@bparees a few significant updates in light of your comments:
|
| cluster roles should be propagated to the associated `ClusterRole`. Once completed, non-terminated | ||
| pods which inject the `ClusterBundle` and whose recorded generation does not match the | ||
| `ClusterBundle`'s new generation need to be evicted. This will allow new pods to be created with | ||
| appropriate volume mounts for the cluster bundles. |
There was a problem hiding this comment.
i think we need to discuss this some more. That's a pretty hard constrraint, so i want to talk about what else we can/can't do. And perhaps, if it's really bad, the better answer is to say "you can't edit cluster bundles, you have to delete and recreate them to change them". (still not sure what that means for a running pod w/ a ref to the clusterbundle in question though).
but in general, let's think about whether we need to put more explicit constraints on expectations, rather than a "it kinda works but only if you do this thing that no one ever wants to do"
There was a problem hiding this comment.
For Bundle objects, the only changes to spec that are allowed are adds/removals from the entitlements and yumRepositories lists. Propagating this to pods requires changing the Volume, which can't be done on a running pod.
We can block edits to spec via a separate validating webhook.
For deleted Bundles, a potentially valid alternative is to leave the pods as they are.
With ClusterBundle, we have the added complexity of the cluster aggregation roles. We probably need a similar/same approach to edits - block edits to spec via a webhook. And likewise for deletions, rather than using OwnerRefs we instead track our related objects in status and leave them in place on deletion. The only thing we may do is change the referenced cluster role so that it no longer grants read permission on the removed ClusterBundle object.
There was a problem hiding this comment.
on further thinking about this, this may not be as big a deal as i was thinking, since i expect 99% of the usage of this feature to be "launch pod, immediately use entitlements to pull in content".
I suppose someone might try to have a long-lived builder pod that provides some sort of build service where it needs to periodically grab new rpm content, but that seems unlikely.
So it might be reasonable to just assert up front that the CSI driver is based on a "you get the content based on the state at pod launch and there are no dynamic updates to the content" model.
it needs to be stated very clearly and it's possible there will be some pushback, but given how much it simplifies things and the fact that i think it achieves the primary use case here, i'd be ok w/ it personally.
There was a problem hiding this comment.
I've attempted to clarify the behavior in a few places:
- In "Non-goals", I've made automatic key rotation and "add an extra entitlement via a new
Secret" out of scope. - Updated the
BundleandClusterBundlebehavior to keep non-terminated pods in place on update and delete. - Add the risk of "Containers run with outdated entitlement keys" - with the mitigation of "update the secret with the original keys".
|
@bparees I think I've covered all of the outstanding issues in my latest push. PTAL, and if all looks good I'll squash commits. |
b9d12b3 to
9166a95
Compare
|
@bparees squashed commits. |
bparees
left a comment
There was a problem hiding this comment.
final nits and lgtm.
probably worth having the auth/sec team review what you're planning to do here (if you haven't already) to ensure they don't see any red flags around the admission/mutation logic or escalation pathways.
| # Deny some bundles in the namespace, but allow others | ||
| subscription.openshift.io/deny-bundles: "bundle-B,bundle-C" | ||
| # Deny all bundles in the namespace | ||
| subscription.openshift.io/deny-bundles: "*" |
There was a problem hiding this comment.
so to be clear the default behavior if you don't specify any allow/deny annotation is to allow *, right?
There was a problem hiding this comment.
Yes - I'll add a note.
This enhancement proposal introduces a new operator and components which will allow subscription content access data to be injected into pods. This simplifies the process of installing RHEL content inside a container. Subscription information is declared by a developer or cluster administrator as a "Bundle", which consists of the following: 1. A list of `entitlements`, referencing `Secrets` containing RHEL entitlement keys. 2. A list of `yumReposittories`, referencing `ConfigMaps` containing RHEL yum.repo definitions. This information can be injected into any "pod-specable" workload via an annotation. A mutating admission webhook injects the entitlement keys and yum.repo definitions into all containers via volume mounts, in the locations expected by RHEL/UBI containers. Two flavors of the bundle will be available: 1. `Bundles`, which are namespace-scoped. 2. `ClusterBundles`, which are cluster-scoped. The latter `ClusterBundle` will utilize the Projected Resource CSI driver [1] to share the entitlement `Secrets` and yum.repo `ConfigMaps` across namespaces. Both `Bundle` and `ClusterBundle` will support automatic injection of subscription data. Workloads can opt out of automatic injection by seprate allow/deny annotations. [1] https://github.com/openshift/csi-driver-projected-resource
c93a01d to
0b8d9b2
Compare
|
@bparees somehow I missed the notification from you and Zvanko. Suggestions merged and commits squashed. |
|
|
||
| - All APIs reach `v1` stability | ||
| - Projected resource CSI driver reaches GA | ||
| - Security audit for these components and the projected resource CSI driver |
There was a problem hiding this comment.
@bparees based on the guidelines for security audits, these work best when there is working code to deal with. We have the security audit here as a GA graduation requirement, though there is no reason why we can't ask for one earlier (say after we reach Dev Preview on the csi driver and injection operator).
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: adambkaplan, bparees The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
| 1. Create a `Deployment` that runs the webhook in a highly available mode. | ||
| 2. Create a `Service` to expose the webhook pods, with generated TLS cert/key pairs. | ||
| 3. Create a `ConfigMap` where the service serving CA can be injected. | ||
| 4. Create a `MutatingWebhookConfiguration` that fires the webhook for all created pods. |
There was a problem hiding this comment.
this is a no-go! How will you restrict the mechanism to only pods that matter? Mutating pods is nothing we want to have for resilience reasons.
There was a problem hiding this comment.
are you aware that's exactly what Istio does to inject its sidecars?
https://istio.io/latest/docs/ops/configuration/mesh/injection-concepts/
There was a problem hiding this comment.
I am aware and it is not good for availability. If every other component thinks it is Istio ...
There was a problem hiding this comment.
At the very least this needs a risk assessment and discussion of mitigations.
Uh oh!
There was an error while loading. Please reload this page.