Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE]: OpenShift support #4243

Open
worldofgeese opened this issue May 12, 2023 · 0 comments
Open

[FEATURE]: OpenShift support #4243

worldofgeese opened this issue May 12, 2023 · 0 comments

Comments

@worldofgeese
Copy link
Contributor

worldofgeese commented May 12, 2023

Feature Request

Background / Motivation

According to Red Hat client data from 2019,

Nearly 50% of the top Fortune 100 companies are using Red Hat OpenShift. Almost 30% of the top Fortune Global 500 companies use Openshift

It's a big slice of up-market. OpenShift adoption hinges on its status as an all-in-one platform that's secure-by-default, and ships integrated monitoring, and GitOps.

What should the user be able to do?

An OpenShift user should be able to easily Build, Run, Test, and Deploy to any OpenShift cluster, whether it's Red Hat OpenShift Local, the Developer Sandbox for Red Hat OpenShift, or an OpenShift cluster running on Azure, AWS, or GCP.

The Developer Sandbox for Red Hat OpenShift offers free remote OpenShift clusters for 30-days, of great potential value to users without a remote cluster on-hand.

Why do they want to do this? What problem does it solve?

Using Garden with OpenShift is currently fraught, requiring significant manual intervention.

Suggested Implementation(s)

  • For any images Garden deploys, they should support arbitrary user IDs.
  • When running against an OpenShift cluster, create projects, not namespaces. Direct namespace creation is generally disallowed and even project creation is frequently restricted. It's best to assume developers are assigned a single project they declare in their Garden project with no other rights to list, view, or create other projects.
  • Don't assume listing namespaces is available: OpenShift developers do not have this permission by default.
  • When running against an OpenShift cluster, disable garden-system namespace creation and deploy all garden-system bound resources to the user's defined project.
  • Deploy the OpenShift-compatible NGINX Ingress Controller Operator when OpenShift is detected. OpenShift ships with its own HAProxy-based Ingress Operator
  • Garden should avoid running internal kubectl commands with --namespace=default affixed. See subheading Avoid default namespace down the page for more

Issues observed

For garden deploy, Garden fails to list namespaces because listing namespaces is not allowed for OpenShift developers by default.

  - '- local-kubernetes: resolve provider local-kubernetes failed: Error: Got error from Kubernetes API (listNamespace) - namespaces is forbidden: User "developer" cannot list resource "namespaces" in API group "" at the cluster scope'

To work around log in as an administrator user, oc login -u kubeadmin -p $OPENSHIFT_ADMIN_PASSWORD https://api.crc.testing:6443

And add the cluster-reader role to the developer user: oc adm policy add-cluster-role-to-user cluster-reader developer.

This is not yet enough. Switching back to the developer user and running garden deploy returns a new error

'namespaces is forbidden: User "developer" cannot create resource "namespaces" in API group "" at the cluster scope'

To work around, ensure your Garden namespace is equivalent to your Kubernetes namespace by declaring namespace: ${environment.namespace} then running garden deploy.

This results in the new blocker

 'namespaces is forbidden: User "developer" cannot create resource "namespaces" in API group "" at the cluster scope'

To work around, create the new project manually with oc new-project garden-system and try to re-deploy

  message: 'ingressclasses.networking.k8s.io is forbidden: User "developer" cannot list resource "ingressclasses" in API group "networking.k8s.io" at the cluster scope'

Switch back to the kubeadmin user and create a ClusterRole to allow namespace listing

cat <<EOF | oc apply -f -
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: ingressclasses-list
rules:
- apiGroups: ["networking.k8s.io"]
  resources: ["ingressclasses"]
  verbs: ["get", "list", "watch"]
EOF

and apply this cluster role to the developer user with oc adm policy add-cluster-role-to-user ingressclasses-list developer.

On re-deploy, Garden will attempt to deploy the NGINX Ingress controller Helm chart to the cluster which will fail. Logs for the default-backend pod will give the permissions error

[emerg] 1#1: mkdir() "/var/cache/nginx/client_temp" failed (13: Permission denied)
nginx: [emerg] mkdir() "/var/cache/nginx/client_temp" failed (13: Permission denied)

The ingress controller doesn't have the necessary permissions because OpenShift runs containers by default with random UID's.

Oddly, even for actions requiring no access to secrets such as this hello-world container

kind: Deploy
type: container
name: hello-world
spec: 
  image: paulbouwer/hello-kubernetes:1.10
  ingresses:
    - port: http
      linkUrl: http://localhost:30080
  ports:
    - name: http
      containerPort: 8080
      nodePort: 30080

garden will fail

Failed processing Deploy type=container name=hello-world. Here is the output:

"secrets is forbidden: User \"developer\" cannot list resource \"secrets\" in API group \"\" in the namespace \"default\"","reason":"Forbidden","details":{"kind":"secrets"},"code":403}

Switch back to the kubeadmin user and create a Role that allows reading secrets

cat <<EOF | oc apply -f -
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  namespace: default
  name: secret-reader
rules:
- apiGroups: [""]
  resources: ["secrets"]
  verbs: ["get", "list", "watch"]
EOF

then apply with oc adm policy add-role-to-user secret-reader developer -n default.

Avoid default namespace

At this point Garden still runs kubectl commands internally with the default namespace: Command "/home/taohansen/.garden/tools/kubectl/49eb930aa565a80f/kubectl --context=crc-developer --namespace=default apply --output=json -f -" failed with code 1:

which spawns a litany of errors

Error from server (Forbidden): error when creating "STDIN": deployments.apps is forbidden: User "developer" cannot create resource "deployments" in API group "apps" in the namespace "default"
Error from server (Forbidden): error when creating "STDIN": services is forbidden: User "developer" cannot create resource "services" in API group "" in the namespace "default"
Error from server (Forbidden): error when creating "STDIN": ingresses.networking.k8s.io is forbidden: User "developer" cannot create resource "ingresses" in API group "networking.k8s.io" in the namespace "default"

Error from server (Forbidden): error when creating "STDIN": deployments.apps is forbidden: User "developer" cannot create resource "deployments" in API group "apps" in the namespace "default"
Error from server (Forbidden): error when creating "STDIN": services is forbidden: User "developer" cannot create resource "services" in API group "" in the namespace "default"
Error from server (Forbidden): error when creating "STDIN": ingresses.networking.k8s.io is forbidden: User "developer" cannot create resource "ingresses" in API group "networking.k8s.io" in the namespace "default"

You know the drill. Switch back to the kubeadmin user then create and apply a new Role allowing these default namespace actions

cat <<EOF | oc apply -f -
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  namespace: default
  name: deploy-services-ingresses
rules:
- apiGroups: ["apps"]
  resources: ["deployments"]
  verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
- apiGroups: [""]
  resources: ["services"]
  verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
- apiGroups: ["networking.k8s.io"]
  resources: ["ingresses"]
  verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
EOF
oc adm policy add-role-to-user deploy-services-ingresses developer -n default

Now log back in as a developer and re-deploy:

oc login -u developer -p developer https://api.crc.testing:6443`
garden deploy

Our first successful deploy!

ℹ deploy.hello-world   → missing
ℹ deploy.hello-world   → Deploying version v-cd9fc29b97...
ℹ deploy.hello-world   → Waiting for resources to be ready...
ℹ deploy.hello-world   → Deployment/hello-world: Successfully assigned default/hello-world-6db85f9f8f-5vks6 to crc-74q6p-master-0
✔ deploy.hello-world   → Done (in 21.8 sec)

In-cluster image builds

OpenShift ships its own internal registry. Extensive background is available at the OpenShift Local docs but if you just want to get going, log in to the OpenShift Local image registry then test you have access by mirroring an image from Red Hat's image registry to the internal

oc project user-$USER
oc registry login --insecure=true

Save the registry URL returned: info: Using registry public hostname default-route-openshift-image-registry.apps-crc.testing.

Now mirror an image to your internal registry to prove you're authorized

oc image mirror registry.access.redhat.com/ubi8/ubi:latest=default-route-openshift-image-registry.apps-crc.testing/user-$USER/ubi8:latest --insecure=true --filter-by-os=linux/amd64

We'll declare kaniko as our in-cluster image builder and set the internal image registry as our deploymentRegistry in our project.garden.yml

    buildMode: kaniko
    deploymentRegistry:
      hostname: default-route-openshift-image-registry.apps-crc.testing
      namespace: user-${local.username}

Running garden build will hang and that's because the garden-util ReplicaSet has run into the other big security blocker when running OpenShift: Security Context Constraints (SCCs)

Error creating: pods "garden-util-59f6856d6b-" is forbidden: unable to validate against any security context constraint: [provider "anyuid": Forbidden: not usable by user or serviceaccount, spec.containers[0].securityContext.runAsUser: Invalid value: 1000: must be in the ranges: [1000660000, 1000669999], provider "restricted": Forbidden: not usable by user or serviceaccount, provider "nonroot-v2": Forbidden: not usable by user or serviceaccount, provider "nonroot": Forbidden: not usable by user or serviceaccount, provider "hostmount-anyuid": Forbidden: not usable by user or serviceaccount, provider "machine-api-termination-handler": Forbidden: not usable by user or serviceaccount, provider "hostnetwork-v2": Forbidden: not usable by user or serviceaccount, provider "hostnetwork": Forbidden: not usable by user or serviceaccount, provider "hostaccess": Forbidden: not usable by user or serviceaccount, provider "hostpath-provisioner": Forbidden: not usable by user or serviceaccount, provider "privileged": Forbidden: not usable by user or serviceaccount]

OpenShift is telling us that the garden-utils pod wants to run as UID 1000 but OpenShift's default SCCs forbid it. Why?

By default, OpenShift Container Platform runs containers using an arbitrarily assigned user ID. This provides additional security against processes escaping the container due to a container engine vulnerability and thereby achieving escalated permissions on the host node.

To grant our pod privileges to run as UID 1000, we'll need to change to our kubeadmin user, add the SCC to run as any UID, then switch back to the our developer user and delete the existing ReplicaSet

oc login -u kubeadmin -p $PASSWORD https://api.crc.testing:6443
oc adm policy add-scc-to-user anyuid -z default -n user-$USER
oc -n user-$USER get rs | grep ^garden-util | awk '{print $1}' | xargs oc -n user-$USER delete rs

If you run oc get rs -n user-$USER, you'll see garden-util is running. Run garden build. If you encounter Deployment/garden-util: 0/1 nodes are available: 1 Insufficient memory. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod you may need to increase the cpu and memory of your OpenShift Local cluster

crc config set memory 10240
crc config set cpus 4

You must restart your cluster before the increased limits take effect with crc stop then crc start.

OpenShift's internal registry is self-signed. Now, even if flags are passed to kaniko to ignore

    kaniko:
      extraFlags: [ "--insecure", "--skip-tls-verify" ]

Garden continues to error

Failed resolving status for Build type=container name=api. Here is the output:
────────────────────────────────────────────────────────────────────────────────
Unable to query registry for image status: time="2023-05-15T10:40:44Z" level=fatal msg="Error parsing image name \"docker://default-route-openshift-image-registry.apps-crc.testing/user-taohansen/api:v-104fc77578\": pinging container registry default-route-openshift-image-registry.apps-crc.testing: Get \"https://default-route-openshift-image-registry.apps-crc.testing/v2/\": x509: certificate signed by unknown authority"

This is because docker manifest inspect is called, not kaniko. docker manifest inspect offers an --insecure flag but it's only called by Garden when [isLocalHostname](https://github.com/garden-io/garden/blob/0d98881c69d367ffdf6eab197be97b696dc8ada6/core/src/plugins/kubernetes/container/build/common.ts#L404) is equal to localhost or starts with 127 and is otherwise unconfigurable.

Very well, let's try a remote registry like Docker Hub. Start by creating an image pull secret oc create secret docker-registry regcred --docker-server=https://index.docker.io/v1/ --docker-username=$DOCKER_USER --docker-password=$DOCKER_PASSWORD

then fill in your project.garden.yml with the relevant entries

...
    deploymentRegistry:
      # hostname: default-route-openshift-image-registry.apps-crc.testing
      # namespace: user-${local.username}
      hostname: docker.io
      namespace: worldofgeese

    imagePullSecrets:
    - name: regcred
      namespace: user-${local.username}

If you try now to garden build you'll run into a new error, a side effect of running containers rootlessly in OpenShift

error building image: error building stage: failed to get filesystem from image: failed to write "security.capability" attribute to "/usr/bin/newgidmap": operation not permitted

To bypass, I found an issue that advised setting some kaniko flags

    kaniko:
      extraFlags:
        - "--ignore-path=/usr/bin/newuidmap"
        - "--ignore-path=/usr/bin/newgidmap"
        - "--ignore-path=/usr/sbin/suexec"

Try a new garden build and you'll be blocked, though the error won't stream in your CLI. You'll need to tail the ephemeral kaniko pod in your user project

WARN[0061] error uploading layer to cache: failed to push to destination index.docker.io/worldofgeese/api/cache:9fc18a0127045b73b51a3ff8cf0e2e6d3763a2f50f6e6573c582e492f48e0ff6: GET https://auth.docker.io/token?scope=repository%3Aworldofgeese%2Fapi%2Fcache%3Apush%2Cpull&service=registry.docker.io: unexpected status code 400 Bad Request: {"details":"invalid repository name"}

To work around, I needed to set an additional kaniko flag, --cache=false. Your kaniko configuration should now look like

    kaniko:
      extraFlags:
        - "--ignore-path=/usr/bin/newuidmap"
        - "--ignore-path=/usr/bin/newgidmap"
        - "--ignore-path=/usr/sbin/suexec"
        - "--cache=false"

On a new garden build it should just work. Whew! 😪

For the final, workable, project configuration and Python application, see the accompanying gist.

How important is this feature for you/your team?

🌵 Not having this feature makes using Garden painful

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants