Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add trusted CA bundle support for OpenShift #1079

Merged
merged 4 commits into from
Jun 15, 2020
Merged

Add trusted CA bundle support for OpenShift #1079

merged 4 commits into from
Jun 15, 2020

Conversation

objectiser
Copy link
Contributor

No description provided.

@codecov
Copy link

codecov bot commented Jun 2, 2020

Codecov Report

Merging #1079 into master will increase coverage by 0.19%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1079      +/-   ##
==========================================
+ Coverage   87.81%   88.01%   +0.19%     
==========================================
  Files          85       86       +1     
  Lines        5172     5256      +84     
==========================================
+ Hits         4542     4626      +84     
  Misses        466      466              
  Partials      164      164              
Impacted Files Coverage Δ
pkg/config/ca/ca.go 100.00% <100.00%> (ø)
pkg/cronjob/es_index_cleaner.go 100.00% <100.00%> (ø)
pkg/cronjob/es_rollover.go 95.90% <100.00%> (+0.03%) ⬆️
pkg/cronjob/spark_dependencies.go 92.30% <100.00%> (+0.07%) ⬆️
pkg/deployment/agent.go 96.21% <100.00%> (+0.02%) ⬆️
pkg/deployment/all_in_one.go 100.00% <100.00%> (ø)
pkg/deployment/collector.go 96.79% <100.00%> (+0.02%) ⬆️
pkg/deployment/ingester.go 96.26% <100.00%> (+0.02%) ⬆️
pkg/deployment/query.go 100.00% <100.00%> (ø)
pkg/inject/oauth_proxy.go 95.94% <100.00%> (+0.35%) ⬆️
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c920e75...6743e63. Read the comment docs.

@objectiser objectiser added the Do Not Merge The PR should not be merged yet. label Jun 2, 2020
@objectiser objectiser changed the title Trustedcab Add trusted CA bundle support for OpenShift Jun 2, 2020
@objectiser
Copy link
Contributor Author

@kevinearls Would it be possible to kick off a test of this PR on OCP? 4.4 and 4.2 would be good. In addition to any automated tests, can you just check the Jaeger UI is available via the route.

@kevinearls
Copy link
Contributor

@objectiser When I ran the CI tests on OCP 4.4 I got two failures.

The operator log contained a number of these messages: https://www.google.com/url?q=https://primetime.bluejeans.com/a2m/live-event/pahjkwua&sa=D&ust=1591558608965000&usg=AOvVaw3tHOqaVaXzpqQt1V4l6Iq1

The tests create a Jaeger instance for each test case, so maybe we need to delete the configmap when the jaeger instance is deleted.

@objectiser
Copy link
Contributor Author

@kevinearls I've updated the PR, as its possible a user would have already configured the volumeMount with a trusted CA bundle (i.e. Service Mesh will be doing that).

I tried to manually deploy to reproduce the problem you were experiencing, and the configmap seems to be correctly deleted when the Jaeger instance CR is removed. Would you be able to retest - and if necessary update the tests to wait for the configmap to be undeployed (if that is the issue)?

@kevinearls
Copy link
Contributor

@objectiser The ES index tests are still failing on me, although I found out that most of the errors in the log occurred immediately after the Jaeger instance was created: https://github.com/jaegertracing/jaeger-operator/blob/master/test/e2e/elasticsearch_test.go#L140-L146

That doesn't make much difference initially, as the smoke test step passes. However, the cron job that is supposed to get created when we enable the index cleaner never appears. https://github.com/jaegertracing/jaeger-operator/blob/master/test/e2e/elasticsearch_test.go#L256-L265. I don't see anything in the operator log about this. I will try to get more info tomorrow if possible.

@objectiser
Copy link
Contributor Author

@kevinearls Would it also be worth trying the same tests against jaegertracing/jaeger-operator:master, to see if the same tests are failing, or whether it does appear to be due to the PR?

@kevinearls
Copy link
Contributor

@objectiser It's definitely due to the PR, tests pass when I run against master.

@jpkrohling
Copy link
Contributor

Is this related to #1043?

@objectiser
Copy link
Contributor Author

@jpkrohling No this relates to TRACING-1208 - first attempt to implement was just mounting the voume on the oauth proxy container, which is the original problem - but when that failed in OCP I thought it may need the volume mounted in all components (which is also the same solution that Service Mesh would be implementing by adding the volume/mounts in their Jaeger templates).

So we need to find a solution to TRACING-1208 and decide whether volume/mounts required across all components or just for oauth-proxy. Ideally we need the problem reproduced in a OCP cluster to properly test with custom CA bundle.

@jpkrohling
Copy link
Contributor

Running this PR locally, I noticed that the volume/volumemount is added to the query container, not to the OAuth container:

spec:
  containers:
  - args:
    - --collector.grpc.tls.cert=/etc/tls-config/tls.crt
    - --collector.grpc.tls.enabled=true
    - --collector.grpc.tls.key=/etc/tls-config/tls.key
    - --query.ui-config=/etc/config/ui.json
    - --reporter.grpc.tls.ca=/var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt
    - --reporter.grpc.tls.enabled=true
    - --reporter.grpc.tls.server-name=simplest-collector-headless.default.svc.cluster.local
    - --reporter.type=grpc
    - --sampling.strategies-file=/etc/jaeger/sampling/sampling.json
    env:
    - name: SPAN_STORAGE_TYPE
      value: memory
    - name: COLLECTOR_ZIPKIN_HTTP_PORT
      value: "9411"
    image: jaegertracing/all-in-one:1.18.0
    imagePullPolicy: IfNotPresent
    livenessProbe:
      failureThreshold: 5
      httpGet:
        path: /
        port: 14269
        scheme: HTTP
      initialDelaySeconds: 5
      periodSeconds: 15
      successThreshold: 1
      timeoutSeconds: 1
    name: jaeger
    ports:
    - containerPort: 5775
      name: zk-compact-trft
      protocol: UDP
    - containerPort: 5778
      name: config-rest
      protocol: TCP
    - containerPort: 6831
      name: jg-compact-trft
      protocol: UDP
    - containerPort: 6832
      name: jg-binary-trft
      protocol: UDP
    - containerPort: 9411
      name: zipkin
      protocol: TCP
    - containerPort: 14267
      name: c-tchan-trft
      protocol: TCP
    - containerPort: 14268
      name: c-binary-trft
      protocol: TCP
    - containerPort: 16686
      name: query
      protocol: TCP
    - containerPort: 14269
      name: admin-http
      protocol: TCP
    - containerPort: 14250
      name: grpc
      protocol: TCP
    readinessProbe:
      failureThreshold: 3
      httpGet:
        path: /
        port: 14269
        scheme: HTTP
      initialDelaySeconds: 1
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 1
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /etc/config
      name: simplest-ui-configuration-volume
      readOnly: true
    - mountPath: /etc/jaeger/sampling
      name: simplest-sampling-configuration-volume
      readOnly: true
    - mountPath: /etc/tls-config
      name: simplest-collector-tls-config-volume
      readOnly: true
    - mountPath: /etc/pki/ca-trust/extracted/pem
      name: simplest-trusted-ca
      readOnly: true
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: simplest-ui-proxy-token-bkm5j
      readOnly: true
  - args:
    - --cookie-secret=7+6bwAmtQOK6z5otl0X3RQ==
    - --https-address=:8443
    - --openshift-service-account=simplest-ui-proxy
    - --provider=openshift
    - --tls-cert=/etc/tls/private/tls.crt
    - --tls-key=/etc/tls/private/tls.key
    - --upstream=http://localhost:16686
    image: openshift/oauth-proxy:latest
    imagePullPolicy: Always
    name: oauth-proxy
    ports:
    - containerPort: 8443
      name: public
      protocol: TCP
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /etc/tls/private
      name: simplest-ui-oauth-proxy-tls
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: simplest-ui-proxy-token-bkm5j
      readOnly: true

Your description seems to indicate that it should be added to the OAuth Proxy (which is also what makes sense to me). Once @jkandasa and/or @kevinearls are able to reproduce this in a test cluster, I'll update this PR to get the volume/volumemount added to the OAuth sidecar.

objectiser and others added 4 commits June 9, 2020 16:06
Signed-off-by: Gary Brown <gary@brownuk.com>
…te a Jaeger instance specific trust CA configmap or volume/mount

Signed-off-by: Gary Brown <gary@brownuk.com>
Signed-off-by: Gary Brown <gary@brownuk.com>
Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>
@objectiser
Copy link
Contributor Author

@jpkrohling Must have made the mistake when changing from oauth proxy only to all components. However might be worth testing this PR yourself on OCP to see if you get the es-index-cleaner issue found by @kevinearls?

@jpkrohling
Copy link
Contributor

jpkrohling commented Jun 10, 2020

The last state of this PR is passing the e2e tests in OpenShift 4.4:

 $ KUBERNETES_CONFIG=${KUBECONFIG} NAMESPACE=quay.io/jpkroehling make test
Running unit tests...
?   	github.com/jaegertracing/jaeger-operator/cmd	[no test files]
?   	github.com/jaegertracing/jaeger-operator/cmd/manager	[no test files]
ok  	github.com/jaegertracing/jaeger-operator/pkg/account	0.078s	coverage: 100.0% of statements
?   	github.com/jaegertracing/jaeger-operator/pkg/apis	[no test files]
ok  	github.com/jaegertracing/jaeger-operator/pkg/apis/jaegertracing/v1	0.071s	coverage: 14.0% of statements
?   	github.com/jaegertracing/jaeger-operator/pkg/apis/kafka	[no test files]
ok  	github.com/jaegertracing/jaeger-operator/pkg/autodetect	5.164s	coverage: 85.9% of statements
ok  	github.com/jaegertracing/jaeger-operator/pkg/clusterrolebinding	0.062s	coverage: 100.0% of statements
?   	github.com/jaegertracing/jaeger-operator/pkg/cmd/generate	[no test files]
?   	github.com/jaegertracing/jaeger-operator/pkg/cmd/start	[no test files]
?   	github.com/jaegertracing/jaeger-operator/pkg/cmd/version	[no test files]
ok  	github.com/jaegertracing/jaeger-operator/pkg/config/ca	0.046s	coverage: 100.0% of statements
ok  	github.com/jaegertracing/jaeger-operator/pkg/config/otelconfig	0.152s	coverage: 85.7% of statements
ok  	github.com/jaegertracing/jaeger-operator/pkg/config/sampling	0.091s	coverage: 96.7% of statements
ok  	github.com/jaegertracing/jaeger-operator/pkg/config/tls	0.084s	coverage: 90.0% of statements
ok  	github.com/jaegertracing/jaeger-operator/pkg/config/ui	0.113s	coverage: 94.4% of statements
?   	github.com/jaegertracing/jaeger-operator/pkg/controller	[no test files]
?   	github.com/jaegertracing/jaeger-operator/pkg/controller/deployment	[no test files]
ok  	github.com/jaegertracing/jaeger-operator/pkg/controller/jaeger	2.383s	coverage: 68.8% of statements
?   	github.com/jaegertracing/jaeger-operator/pkg/controller/namespace	[no test files]
ok  	github.com/jaegertracing/jaeger-operator/pkg/cronjob	0.119s	coverage: 94.1% of statements
ok  	github.com/jaegertracing/jaeger-operator/pkg/deployment	0.086s	coverage: 98.3% of statements
ok  	github.com/jaegertracing/jaeger-operator/pkg/ingress	0.138s	coverage: 84.9% of statements
ok  	github.com/jaegertracing/jaeger-operator/pkg/inject	0.131s	coverage: 96.2% of statements
ok  	github.com/jaegertracing/jaeger-operator/pkg/inventory	0.092s	coverage: 100.0% of statements
ok  	github.com/jaegertracing/jaeger-operator/pkg/kafka	0.101s	coverage: 100.0% of statements
ok  	github.com/jaegertracing/jaeger-operator/pkg/route	0.065s	coverage: 86.7% of statements
ok  	github.com/jaegertracing/jaeger-operator/pkg/service	0.040s	coverage: 100.0% of statements
ok  	github.com/jaegertracing/jaeger-operator/pkg/storage	6.719s	coverage: 93.9% of statements
ok  	github.com/jaegertracing/jaeger-operator/pkg/strategy	0.061s	coverage: 97.1% of statements
?   	github.com/jaegertracing/jaeger-operator/pkg/tracing	[no test files]
ok  	github.com/jaegertracing/jaeger-operator/pkg/upgrade	0.036s	coverage: 90.6% of statements
ok  	github.com/jaegertracing/jaeger-operator/pkg/util	0.023s	coverage: 94.0% of statements
ok  	github.com/jaegertracing/jaeger-operator/pkg/version	0.005s	coverage: 57.1% of statements
Formatting code...
Building...
STEP 1: FROM registry.access.redhat.com/ubi8/ubi
STEP 2: ENV OPERATOR=/usr/local/bin/jaeger-operator     USER_UID=1001     USER_NAME=jaeger-operator
--> Using cache 935e0bf5a64a9bde337a055524dfbe0ade13e52cbd4810b2513c0ac8e7d7ee95
STEP 3: RUN INSTALL_PKGS="       openssl       " &&     yum install -y $INSTALL_PKGS &&     rpm -V $INSTALL_PKGS &&     yum clean all &&     mkdir /tmp/_working_dir &&     chmod og+w /tmp/_working_dir
--> Using cache 14e912f6e6bb423c86ff5e17644c6c5cd315bc4bb736d253aa8b10101095310a
STEP 4: COPY scripts/* /scripts/
--> Using cache 23e146deae55de9926c332aee4d2646a56b0ac3b83c29c2eaad6f685927e39d8
STEP 5: COPY build/_output/bin/jaeger-operator ${OPERATOR}
--> fecc9749066
STEP 6: ENTRYPOINT ["/usr/local/bin/jaeger-operator"]
--> b3d4b7b145a
STEP 7: USER ${USER_UID}
STEP 8: COMMIT quay.io/jpkroehling/jaeger-operator:latest
--> 28c40250fd2
28c40250fd2c57ca0410beac580d2e4e1c0cecb8f4a0b3cbf9c34e130763ea21
Pushing image quay.io/jpkroehling/jaeger-operator:latest...
Getting image source signatures
Copying blob 19caec8214c9 done  
Copying blob d777138324bc skipped: already exists  
Copying blob 808aa963fe95 skipped: already exists  
Copying blob 92af68d64d0b skipped: already exists  
Copying blob 92b864bfcfaa skipped: already exists  
Copying config 28c40250fd done  
Writing manifest to image destination
Copying config 28c40250fd [--------------------------------------] 0.0b / 3.4KiB
Writing manifest to image destination
Writing manifest to image destination
Storing signatures
Running Smoke end-to-end tests...
ok  	github.com/jaegertracing/jaeger-operator/test/e2e	541.388s
Creating namespace default
service/cassandra created
statefulset.apps/cassandra created
Running Cassandra end-to-end tests...
ok  	github.com/jaegertracing/jaeger-operator/test/e2e	199.835s
statefulset.apps/elasticsearch created
service/elasticsearch created
Running Elasticsearch end-to-end tests...
ok  	github.com/jaegertracing/jaeger-operator/test/e2e	386.605s
# Elasticsearch requires labeled nodes. These labels are by default present in OCP 4.2
node/ip-10-0-128-181.ec2.internal not labeled
node/ip-10-0-131-182.ec2.internal not labeled
node/ip-10-0-146-97.ec2.internal not labeled
node/ip-10-0-147-38.ec2.internal not labeled
node/ip-10-0-162-203.ec2.internal not labeled
node/ip-10-0-172-218.ec2.internal not labeled
# This is not required in OCP 4.1. The node tuning operator configures the property automatically
# when label tuned.openshift.io/elasticsearch=true label is present on the ES pod. The label
# is configured by ES operator.
namespace/openshift-logging created
Warning: kubectl apply should be used on resource created by either kubectl create --save-config or kubectl apply
customresourcedefinition.apiextensions.k8s.io/prometheusrules.monitoring.coreos.com configured
Warning: kubectl apply should be used on resource created by either kubectl create --save-config or kubectl apply
customresourcedefinition.apiextensions.k8s.io/servicemonitors.monitoring.coreos.com configured
serviceaccount/elasticsearch-operator created
clusterrole.rbac.authorization.k8s.io/elasticsearch-operator created
clusterrolebinding.rbac.authorization.k8s.io/elasticsearch-operator-rolebinding created
customresourcedefinition.apiextensions.k8s.io/elasticsearches.logging.openshift.io created
deployment.apps/elasticsearch-operator created
deployment.apps/elasticsearch-operator image updated
Running Self provisioned Elasticsearch end-to-end tests...
ok  	github.com/jaegertracing/jaeger-operator/test/e2e	373.048s
Creating namespace kafka
namespace/kafka created
clusterrolebinding.rbac.authorization.k8s.io/strimzi-cluster-operator-namespaced created
clusterrolebinding.rbac.authorization.k8s.io/strimzi-cluster-operator-entity-operator-delegation created
clusterrolebinding.rbac.authorization.k8s.io/strimzi-cluster-operator-topic-operator-delegation created
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   647  100   647    0     0   1792      0 --:--:-- --:--:-- --:--:--  1792
100  302k  100  302k    0     0   247k      0  0:00:01  0:00:01 --:--:-- 13.4M

customresourcedefinition.apiextensions.k8s.io/kafkas.kafka.strimzi.io created
rolebinding.rbac.authorization.k8s.io/strimzi-cluster-operator-entity-operator-delegation created
clusterrolebinding.rbac.authorization.k8s.io/strimzi-cluster-operator created
rolebinding.rbac.authorization.k8s.io/strimzi-cluster-operator-topic-operator-delegation created
customresourcedefinition.apiextensions.k8s.io/kafkausers.kafka.strimzi.io created
clusterrole.rbac.authorization.k8s.io/strimzi-entity-operator created
clusterrole.rbac.authorization.k8s.io/strimzi-cluster-operator-global created
clusterrolebinding.rbac.authorization.k8s.io/strimzi-cluster-operator-kafka-broker-delegation created
rolebinding.rbac.authorization.k8s.io/strimzi-cluster-operator created
clusterrole.rbac.authorization.k8s.io/strimzi-cluster-operator-namespaced created
clusterrole.rbac.authorization.k8s.io/strimzi-topic-operator created
serviceaccount/strimzi-cluster-operator created
clusterrole.rbac.authorization.k8s.io/strimzi-kafka-broker created
customresourcedefinition.apiextensions.k8s.io/kafkatopics.kafka.strimzi.io created
customresourcedefinition.apiextensions.k8s.io/kafkabridges.kafka.strimzi.io created
deployment.apps/strimzi-cluster-operator created
customresourcedefinition.apiextensions.k8s.io/kafkaconnectors.kafka.strimzi.io created
customresourcedefinition.apiextensions.k8s.io/kafkaconnects2is.kafka.strimzi.io created
customresourcedefinition.apiextensions.k8s.io/kafkaconnects.kafka.strimzi.io created
customresourcedefinition.apiextensions.k8s.io/kafkamirrormakers.kafka.strimzi.io created
deployment.apps/strimzi-cluster-operator env updated
Creating namespace kafka
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   669  100   669    0     0   2021      0 --:--:-- --:--:-- --:--:--  2021

kafka.kafka.strimzi.io/my-cluster created (dry run)
kafka.kafka.strimzi.io/my-cluster created
Running Streaming end-to-end tests...
ok  	github.com/jaegertracing/jaeger-operator/test/e2e	429.750s
Running Example end-to-end tests part 1...
ok  	github.com/jaegertracing/jaeger-operator/test/e2e	181.107s
Running Example end-to-end tests part 2...
ok  	github.com/jaegertracing/jaeger-operator/test/e2e	204.307s
Running OpenShift Example end-to-end tests...
ok  	github.com/jaegertracing/jaeger-operator/test/e2e	121.173s
Running generate end-to-end tests...
ok  	github.com/jaegertracing/jaeger-operator/test/e2e	39.065s

(edit: output changed to show the results for tests on 4.4)


func deployTrustedCA(jaeger *v1.Jaeger) bool {
for _, vm := range jaeger.Spec.JaegerCommonSpec.VolumeMounts {
if strings.HasPrefix(vm.MountPath, "/etc/pki/ca-trust/extracted/pem") {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jpkrohling Is this a reason approach to disable adding the custom CA bundle/volumes/mounts, if (for example) Service Mesh explicitly adds them into the Jaeger CR? Or do we need a better approach?

Copy link
Contributor

@jpkrohling jpkrohling Jun 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll think about it some more, but I think it's good enough. In any case, there's no way to be 100% sure that there won't be a clash, as the path inside the volumes themselves might end up conflicting (/etc/pki with ca-trust inside vs. /etc/pki/ca-trust).

Edit: actually, we could compare the paths, and see if the CA bundle volume mount would clash with an existing volume mount (/etc/pki in the example above).

@objectiser objectiser removed the Do Not Merge The PR should not be merged yet. label Jun 12, 2020
@objectiser
Copy link
Contributor Author

@jpkrohling If you are happy with the PR as is, can you approve and merge? If any other issues are found in subsequent QE testing they can be fixed in a followup PR.

@jkandasa
Copy link
Member

@objectiser @jpkrohling I verified this PR via quay.io/jpkroehling/jaeger-operator:TRACING-1208.

LGTM 👍

viper.Set("platform", "other")
defer viper.Reset()

jaeger := v1.NewJaeger(types.NamespacedName{Name: "TestGetWithoutTrustedCA"})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we started using a generic name some time ago. Was it changed recently, to use individual names for the instances in each test? If not, there's no need to change this now.

}},
},
Data: map[string]string{
"ca-bundle.crt": "",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests are showing that this is fine, but I wonder if there's a reason to have this as empty first? Also, would the reconciliation phase revert this to empty? If so, OpenShift seems to be refilling this quite fast (possibly with a webhook, before changes are actually applied).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be an idea to check with kubectl get -w configmap -o yaml - could be a potencial race?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants