Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed deployment of Elasticsearch via its operator #234

Merged
merged 2 commits into from
Feb 27, 2019

Conversation

jpkrohling
Copy link
Contributor

Fixes #233 by adding the ES type to the controller's reconcile loop and inventory.

Signed-off-by: Juraci Paixão Kröhling juraci@kroehling.de

@jpkrohling
Copy link
Contributor Author

This change is Reviewable

pkg/controller/jaeger/elasticsearch.go Outdated Show resolved Hide resolved
pkg/controller/jaeger/elasticsearch.go Outdated Show resolved Hide resolved
@jpkrohling
Copy link
Contributor Author

Running locally with this PR causes this when deploying deploy/examples/simple-prod-deploy-es.yaml:

ERROR: logging before flag.Parse: E0226 18:01:00.096980 28875 reflector.go:134] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:196: Failed to list *v1alpha1.Elasticsearch: v1.ListOptions is not suitable for converting to "logging.openshift.io/v1alpha1" in scheme "k8s.io/client-go/kubernetes/scheme/register.go:60"

@codecov
Copy link

codecov bot commented Feb 26, 2019

Codecov Report

Merging #234 into master will decrease coverage by 0.29%.
The diff coverage is 80.48%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master     #234     +/-   ##
=========================================
- Coverage   90.48%   90.19%   -0.3%     
=========================================
  Files          59       61      +2     
  Lines        2680     2753     +73     
=========================================
+ Hits         2425     2483     +58     
- Misses        164      172      +8     
- Partials       91       98      +7
Impacted Files Coverage Δ
pkg/controller/jaeger/jaeger_controller.go 31.88% <0%> (-0.96%) ⬇️
pkg/strategy/strategy.go 78.26% <100%> (+1.69%) ⬆️
pkg/controller/jaeger/elasticsearch.go 69.23% <69.23%> (ø)
pkg/strategy/production.go 75.4% <83.33%> (-0.87%) ⬇️
pkg/inventory/elasticsearch.go 86.66% <86.66%> (ø)
pkg/storage/elasticsearch.go 80% <92.3%> (+1.56%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ed9f1b2...a37fce6. Read the comment docs.

Copy link
Member

@pavolloffay pavolloffay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tested this now.
Once I got:

 ploffay  ~/projects/golang/src/github.com/jaegertracing/jaeger-operator   PR234  make run WATCH_NAMESPACE=myproject                                                                                                                              10:27 
customresourcedefinition.apiextensions.k8s.io/jaegers.io.jaegertracing created
INFO[0000] Versions                                      arch=amd64 operator-sdk=v0.4.1 os=linux version=go1.11.1
INFO[0000] Auto-detected the platform                    platform=openshift
INFO[0000] Starting the Cmd.                            
ERRO[0113] failed to apply the changes                   error="no matches for kind \"Elasticsearch\" in version \"logging.openshift.io/v1alpha1\"" instance=simple-prod namespace=myproject
ERRO[0114] failed to apply the changes                   error="no matches for kind \"Elasticsearch\" in version \"logging.openshift.io/v1alpha1\"" instance=simple-prod namespace=myproject
ERRO[0115] failed to apply the changes                   error="no matches for kind \"Elasticsearch\" in version \"logging.openshift.io/v1alpha1\"" instance=simple-prod namespace=myproject
ERRO[0116] failed to apply the changes                   error="no matches for kind \"Elasticsearch\" in version \"logging.openshift.io/v1alpha1\"" instance=simple-prod namespace=myproject
ERRO[0117] failed to apply the changes                   error="no matches for kind \"Elasticsearch\" in version \"logging.openshift.io/v1alpha1\"" instance=simple-prod namespace=myproject
ERRO[0118] failed to apply the changes                   error="no matches for kind \"Elasticsearch\" in version \"logging.openshift.io/v1alpha1\"" instance=simple-prod namespace=myproject
ERRO[0120] failed to apply the changes                   error="no matches for kind \"Elasticsearch\" in version \"logging.openshift.io/v1alpha1\"" instance=simple-prod namespace=myproject
ERRO[0121] failed to apply the changes                   error="no matches for kind \"Elasticsearch\" in version \"logging.openshift.io/v1alpha1\"" instance=simple-prod namespace=myproject
ERRO[0122] failed to apply the changes                   error="no matches for kind \"Elasticsearch\" in version \"logging.openshift.io/v1alpha1\"" instance=simple-prod namespace=myproject
ERRO[0123] failed to apply the changes                   error="no matches for kind \"Elasticsearch\" in version \"logging.openshift.io/v1alpha1\"" instance=simple-prod namespace=myproject
ERRO[0126] failed to apply the changes                   error="no matches for kind \"Elasticsearch\" in version \"logging.openshift.io/v1alpha1\"" instance=simple-prod namespace=myproject
ERRO[0131] failed to apply the changes                   error="no matches for kind \"Elasticsearch\" in version \"logging.openshift.io/v1alpha1\"" instance=simple-prod namespace=myproject

Second time

INFO[0000] Versions                                      arch=amd64 operator-sdk=v0.4.1 os=linux version=go1.11.1
INFO[0000] Auto-detected the platform                    platform=openshift
INFO[0000] Starting the Cmd.                            
ERRO[0003] failed to apply the changes                   error="CronJob.batch \"simple-prod-es-index-cleaner\" is invalid: spec.jobTemplate.spec.template.spec.containers[0].volumeMounts[0].name: Not found: \"certs\"" instance=simple-prod namespace=myproject
ERRO[0005] failed to apply the changes                   error="CronJob.batch \"simple-prod-es-index-cleaner\" is invalid: spec.jobTemplate.spec.template.spec.containers[0].volumeMounts[0].name: Not found: \"certs\"" instance=simple-prod namespace=myproject
ERRO[0006] failed to apply the changes                   error="CronJob.batch \"simple-prod-es-index-cleaner\" is invalid: spec.jobTemplate.spec.template.spec.containers[0].volumeMounts[0].name: Not found: \"certs\"" instance=simple-prod namespace=myproject
ERRO[0007] failed to apply the changes                   error="CronJob.batch \"simple-prod-es-index-cleaner\" is invalid: spec.jobTemplate.spec.template.spec.containers[0].volumeMounts[0].name: Not found: \"certs\"" instance=simple-prod namespace=myproject
ERRO[0008] failed to apply the changes                   error="CronJob.batch \"simple-prod-es-index-cleaner\" is invalid: spec.jobTemplate.spec.template.spec.containers[0].volumeMounts[0].name: Not found: \"certs\"" instance=simple-prod namespace=myproject

@pavolloffay
Copy link
Member

To test it make sure that deploy/examples/simple-prod-deploy-es.yaml works it should deploy one ES and jaeger components should be up and ready. You can also change the index cleaner schedule time to test it works.

@jpkrohling
Copy link
Contributor Author

jpkrohling commented Feb 27, 2019

I got it working now:

$ kubectl get pods
NAME                                                 READY     STATUS      RESTARTS   AGE
elasticsearch-clientdatamaster-0-1-6c7975dd5-n7g49   1/1       Running     0          1h
simple-prod-collector-76c76f74fb-rqvrs               1/1       Running     0          3m
simple-prod-es-index-cleaner-1551271500-7ftk6        0/1       Completed   0          10s
simple-prod-query-7df889c65b-42s47                   2/2       Running     0          3m


$ kubectl logs simple-prod-collector-76c76f74fb-rqvrs
{"level":"info","ts":1551271304.7249413,"caller":"healthcheck/handler.go:99","msg":"Health Check server started","http-port":14269,"status":"unavailable"}
{"level":"info","ts":1551271307.447118,"caller":"collector/main.go:139","msg":"Starting jaeger-collector TChannel server","port":14267}
{"level":"info","ts":1551271307.4489896,"caller":"grpcserver/grpc_server.go:64","msg":"Starting jaeger-collector gRPC server","grpc-port":"14250"}
{"level":"info","ts":1551271307.4490638,"caller":"collector/main.go:153","msg":"Registering metrics handler with HTTP server","route":"/metrics"}
{"level":"info","ts":1551271307.449096,"caller":"collector/main.go:162","msg":"Starting jaeger-collector HTTP server","http-port":14268}
{"level":"info","ts":1551271307.4491127,"caller":"healthcheck/handler.go:133","msg":"Health Check state change","status":"ready"}
{"level":"info","ts":1551271307.475963,"caller":"collector/main.go:237","msg":"Listening for Zipkin HTTP traffic","zipkin.http-port":9411}

The collector and query were in a failed state for quite some time, because they don't reconnect to Elasticsearch upon failure, so, I had to kill the pods manually. Kubernetes then created a new pod for the deployments, which then made them work.

This feature should be marked as experimental, as it's not really well polished, especially because the collector/query should wait for ES to be ready before they start. If Jaeger could reconnect to ES upon failure like we do with Cassandra, then it wouldn't be a big issue, but right now, we can't do anything else from the Operator's perspective...

Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>
@pavolloffay
Copy link
Member

The collector and query were in a failed state for quite some time, because they don't reconnect to Elasticsearch upon failure, so, I had to kill the pods manually.

K8s should reschedule pods once they fail. At least I was this behavior before update PR merge. I didn't had to kill pods manually - they restarted 2-3 times until ES was in the ready state.

There is an issue #216 which handles the initialization properly.

@jpkrohling
Copy link
Contributor Author

At least I was this behavior before update PR merge

That PR shouldn't change this behavior. The pod was in a "healthy" state, which is why Kubernetes didn't kill it. This is how the logs look like:

$ oc get pods
NAME                                                 READY     STATUS    RESTARTS   AGE
...
simple-prod-collector-76c76f74fb-wv8n8               1/1       Running   0          1h
...


$ oc logs simple-prod-collector-76c76f74fb-wv8n8
{"level":"info","ts":1551266878.3367102,"caller":"healthcheck/handler.go:99","msg":"Health Check server started","http-port":14269,"status":"unavailable"}
{"level":"info","ts":1551266882.0825386,"caller":"collector/main.go:139","msg":"Starting jaeger-collector TChannel server","port":14267}
{"level":"info","ts":1551266882.0826762,"caller":"grpcserver/grpc_server.go:64","msg":"Starting jaeger-collector gRPC server","grpc-port":"14250"}
{"level":"info","ts":1551266882.082747,"caller":"collector/main.go:153","msg":"Registering metrics handler with HTTP server","route":"/metrics"}
{"level":"info","ts":1551266882.0827823,"caller":"collector/main.go:162","msg":"Starting jaeger-collector HTTP server","http-port":14268}
{"level":"info","ts":1551266882.0827997,"caller":"healthcheck/handler.go:133","msg":"Health Check state change","status":"ready"}
{"level":"info","ts":1551266882.1220624,"caller":"collector/main.go:237","msg":"Listening for Zipkin HTTP traffic","zipkin.http-port":9411}
{"level":"error","ts":1551271199.0897613,"caller":"spanstore/writer.go:198","msg":"Failed to create index","trace_id":"265b719bdfc33c79","span_id":"5d238c8980fd5616","error":"no available connection: no Elasticsearch node available","errorVerbose":"no Elasticsearch node available\ngithub.com/jaegertracing/jaeger/vendor/gopkg.in/olivere/elastic%2ev5.init\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/vendor/gopkg.in/olivere/elastic.v5/client.go:88\ngithub.com/jaegertracing/jaeger/pkg/es.init\n\t<autogenerated>:1\ngithub.com/jaegertracing/jaeger/plugin/storage/es.init\n\t<autogenerated>:1\ngithub.com/jaegertracing/jaeger/plugin/storage.init\n\t<autogenerated>:1\ngithub.com/jaegertracing/jaeger/cmd/env.init\n\t<autogenerated>:1\nmain.init\n\t<autogenerated>:1\nruntime.main\n\t/home/travis/.gimme/versions/go1.11.1.linux.amd64/src/runtime/proc.go:189\nruntime.goexit\n\t/home/travis/.gimme/versions/go1.11.1.linux.amd64/src/runtime/asm_amd64.s:1333\nno available connection\ngithub.com/jaegertracing/jaeger/vendor/gopkg.in/olivere/elastic%2ev5.(*Client).next\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/vendor/gopkg.in/olivere/elastic.v5/client.go:1157\ngithub.com/jaegertracing/jaeger/vendor/gopkg.in/olivere/elastic%2ev5.(*Client).PerformRequestWithOptions\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/vendor/gopkg.in/olivere/elastic.v5/client.go:1254\ngithub.com/jaegertracing/jaeger/vendor/gopkg.in/olivere/elastic%2ev5.(*Client).PerformRequest\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/vendor/gopkg.in/olivere/elastic.v5/client.go:1190\ngithub.com/jaegertracing/jaeger/vendor/gopkg.in/olivere/elastic%2ev5.(*IndicesCreateService).Do\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/vendor/gopkg.in/olivere/elastic.v5/indices_create.go:112\ngithub.com/jaegertracing/jaeger/pkg/es/wrapper.IndicesCreateServiceWrapper.Do\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/pkg/es/wrapper/wrapper.go:105\ngithub.com/jaegertracing/jaeger/plugin/storage/es/spanstore.(*SpanWriter).createIndex\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/plugin/storage/es/spanstore/writer.go:159\ngithub.com/jaegertracing/jaeger/plugin/storage/es/spanstore.(*SpanWriter).WriteSpan\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/plugin/storage/es/spanstore/writer.go:135\ngithub.com/jaegertracing/jaeger/cmd/collector/app.(*spanProcessor).saveSpan\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/cmd/collector/app/span_processor.go:101\ngithub.com/jaegertracing/jaeger/cmd/collector/app.(*spanProcessor).saveSpan-fm\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/cmd/collector/app/span_processor.go:88\ngithub.com/jaegertracing/jaeger/cmd/collector/app.ChainedProcessSpan.func1\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/cmd/collector/app/model_consumer.go:34\ngithub.com/jaegertracing/jaeger/cmd/collector/app.(*spanProcessor).processItemFromQueue\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/cmd/collector/app/span_processor.go:127\ngithub.com/jaegertracing/jaeger/cmd/collector/app.NewSpanProcessor.func1\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/cmd/collector/app/span_processor.go:56\ngithub.com/jaegertracing/jaeger/pkg/queue.(*BoundedQueue).StartConsumers.func1\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/pkg/queue/bounded_queue.go:65\nruntime.goexit\n\t/home/travis/.gimme/versions/go1.11.1.linux.amd64/src/runtime/asm_amd64.s:1333","stacktrace":"github.com/jaegertracing/jaeger/plugin/storage/es/spanstore.(*SpanWriter).logError\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/plugin/storage/es/spanstore/writer.go:198\ngithub.com/jaegertracing/jaeger/plugin/storage/es/spanstore.(*SpanWriter).createIndex\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/plugin/storage/es/spanstore/writer.go:168\ngithub.com/jaegertracing/jaeger/plugin/storage/es/spanstore.(*SpanWriter).WriteSpan\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/plugin/storage/es/spanstore/writer.go:135\ngithub.com/jaegertracing/jaeger/cmd/collector/app.(*spanProcessor).saveSpan\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/cmd/collector/app/span_processor.go:101\ngithub.com/jaegertracing/jaeger/cmd/collector/app.(*spanProcessor).saveSpan-fm\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/cmd/collector/app/span_processor.go:88\ngithub.com/jaegertracing/jaeger/cmd/collector/app.ChainedProcessSpan.func1\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/cmd/collector/app/model_consumer.go:34\ngithub.com/jaegertracing/jaeger/cmd/collector/app.(*spanProcessor).processItemFromQueue\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/cmd/collector/app/span_processor.go:127\ngithub.com/jaegertracing/jaeger/cmd/collector/app.NewSpanProcessor.func1\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/cmd/collector/app/span_processor.go:56\ngithub.com/jaegertracing/jaeger/pkg/queue.(*BoundedQueue).StartConsumers.func1\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/pkg/queue/bounded_queue.go:65"}
{"level":"error","ts":1551271199.0905058,"caller":"app/span_processor.go:102","msg":"Failed to save span","error":"Failed to create index: no available connection: no Elasticsearch node available","errorVerbose":"no Elasticsearch node available\ngithub.com/jaegertracing/jaeger/vendor/gopkg.in/olivere/elastic%2ev5.init\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/vendor/gopkg.in/olivere/elastic.v5/client.go:88\ngithub.com/jaegertracing/jaeger/pkg/es.init\n\t<autogenerated>:1\ngithub.com/jaegertracing/jaeger/plugin/storage/es.init\n\t<autogenerated>:1\ngithub.com/jaegertracing/jaeger/plugin/storage.init\n\t<autogenerated>:1\ngithub.com/jaegertracing/jaeger/cmd/env.init\n\t<autogenerated>:1\nmain.init\n\t<autogenerated>:1\nruntime.main\n\t/home/travis/.gimme/versions/go1.11.1.linux.amd64/src/runtime/proc.go:189\nruntime.goexit\n\t/home/travis/.gimme/versions/go1.11.1.linux.amd64/src/runtime/asm_amd64.s:1333\nno available connection\ngithub.com/jaegertracing/jaeger/vendor/gopkg.in/olivere/elastic%2ev5.(*Client).next\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/vendor/gopkg.in/olivere/elastic.v5/client.go:1157\ngithub.com/jaegertracing/jaeger/vendor/gopkg.in/olivere/elastic%2ev5.(*Client).PerformRequestWithOptions\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/vendor/gopkg.in/olivere/elastic.v5/client.go:1254\ngithub.com/jaegertracing/jaeger/vendor/gopkg.in/olivere/elastic%2ev5.(*Client).PerformRequest\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/vendor/gopkg.in/olivere/elastic.v5/client.go:1190\ngithub.com/jaegertracing/jaeger/vendor/gopkg.in/olivere/elastic%2ev5.(*IndicesCreateService).Do\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/vendor/gopkg.in/olivere/elastic.v5/indices_create.go:112\ngithub.com/jaegertracing/jaeger/pkg/es/wrapper.IndicesCreateServiceWrapper.Do\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/pkg/es/wrapper/wrapper.go:105\ngithub.com/jaegertracing/jaeger/plugin/storage/es/spanstore.(*SpanWriter).createIndex\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/plugin/storage/es/spanstore/writer.go:159\ngithub.com/jaegertracing/jaeger/plugin/storage/es/spanstore.(*SpanWriter).WriteSpan\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/plugin/storage/es/spanstore/writer.go:135\ngithub.com/jaegertracing/jaeger/cmd/collector/app.(*spanProcessor).saveSpan\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/cmd/collector/app/span_processor.go:101\ngithub.com/jaegertracing/jaeger/cmd/collector/app.(*spanProcessor).saveSpan-fm\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/cmd/collector/app/span_processor.go:88\ngithub.com/jaegertracing/jaeger/cmd/collector/app.ChainedProcessSpan.func1\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/cmd/collector/app/model_consumer.go:34\ngithub.com/jaegertracing/jaeger/cmd/collector/app.(*spanProcessor).processItemFromQueue\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/cmd/collector/app/span_processor.go:127\ngithub.com/jaegertracing/jaeger/cmd/collector/app.NewSpanProcessor.func1\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/cmd/collector/app/span_processor.go:56\ngithub.com/jaegertracing/jaeger/pkg/queue.(*BoundedQueue).StartConsumers.func1\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/pkg/queue/bounded_queue.go:65\nruntime.goexit\n\t/home/travis/.gimme/versions/go1.11.1.linux.amd64/src/runtime/asm_amd64.s:1333\nFailed to create index\ngithub.com/jaegertracing/jaeger/plugin/storage/es/spanstore.(*SpanWriter).logError\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/plugin/storage/es/spanstore/writer.go:199\ngithub.com/jaegertracing/jaeger/plugin/storage/es/spanstore.(*SpanWriter).createIndex\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/plugin/storage/es/spanstore/writer.go:168\ngithub.com/jaegertracing/jaeger/plugin/storage/es/spanstore.(*SpanWriter).WriteSpan\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/plugin/storage/es/spanstore/writer.go:135\ngithub.com/jaegertracing/jaeger/cmd/collector/app.(*spanProcessor).saveSpan\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/cmd/collector/app/span_processor.go:101\ngithub.com/jaegertracing/jaeger/cmd/collector/app.(*spanProcessor).saveSpan-fm\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/cmd/collector/app/span_processor.go:88\ngithub.com/jaegertracing/jaeger/cmd/collector/app.ChainedProcessSpan.func1\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/cmd/collector/app/model_consumer.go:34\ngithub.com/jaegertracing/jaeger/cmd/collector/app.(*spanProcessor).processItemFromQueue\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/cmd/collector/app/span_processor.go:127\ngithub.com/jaegertracing/jaeger/cmd/collector/app.NewSpanProcessor.func1\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/cmd/collector/app/span_processor.go:56\ngithub.com/jaegertracing/jaeger/pkg/queue.(*BoundedQueue).StartConsumers.func1\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/pkg/queue/bounded_queue.go:65\nruntime.goexit\n\t/home/travis/.gimme/versions/go1.11.1.linux.amd64/src/runtime/asm_amd64.s:1333","stacktrace":"github.com/jaegertracing/jaeger/cmd/collector/app.(*spanProcessor).saveSpan\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/cmd/collector/app/span_processor.go:102\ngithub.com/jaegertracing/jaeger/cmd/collector/app.(*spanProcessor).saveSpan-fm\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/cmd/collector/app/span_processor.go:88\ngithub.com/jaegertracing/jaeger/cmd/collector/app.ChainedProcessSpan.func1\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/cmd/collector/app/model_consumer.go:34\ngithub.com/jaegertracing/jaeger/cmd/collector/app.(*spanProcessor).processItemFromQueue\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/cmd/collector/app/span_processor.go:127\ngithub.com/jaegertracing/jaeger/cmd/collector/app.NewSpanProcessor.func1\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/cmd/collector/app/span_processor.go:56\ngithub.com/jaegertracing/jaeger/pkg/queue.(*BoundedQueue).StartConsumers.func1\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/pkg/queue/bounded_queue.go:65"}

@pavolloffay
Copy link
Member

The query and collector exit with 1 if there are no ES nodes available. Maybe you should wait longer?

I have tested on minishift and it worked like before. However, I get the following errors when I edit jaeger e.g. (oc edit jaeger simple-prod)

make run WATCH_NAMESPACE=myproject                                                                                                                       2:23 
INFO[0000] Versions                                      arch=amd64 operator-sdk=v0.4.1 os=linux version=go1.11.1
INFO[0000] Auto-detected the platform                    platform=openshift
INFO[0000] Starting the Cmd.                            
INFO[0010] Configured Jaeger instance                    instance=simple-prod namespace=myproject
ERRO[0159] failed to apply the changes                   error="elasticsearches.logging.openshift.io \"elasticsearch\" already exists" instance=simple-prod namespace=myproject
ERRO[0160] failed to apply the changes                   error="elasticsearches.logging.openshift.io \"elasticsearch\" already exists" instance=simple-prod namespace=myproject
ERRO[0161] failed to apply the changes                   error="elasticsearches.logging.openshift.io \"elasticsearch\" already exists" instance=simple-prod namespace=myproject
ERRO[0162] failed to apply the changes                   error="elasticsearches.logging.openshift.io \"elasticsearch\" already exists" instance=simple-prod namespace=myproject
ERRO[0164] failed to apply the changes                   error="elasticsearches.logging.openshift.io \"elasticsearch\" already exists" instance=simple-prod namespace=myproject
ERRO[0165] failed to apply the changes                   error="elasticsearches.logging.openshift.io \"elasticsearch\" already exists" instance=simple-prod namespace=myproject
ERRO[0166] failed to apply the changes                   error="elasticsearches.logging.openshift.io \"elasticsearch\" already exists" instance=simple-prod namespace=myproject
ERRO[0167] failed to apply the changes                   error="elasticsearches.logging.openshift.io \"elasticsearch\" already exists" instance=simple-prod namespace=myproject
ERRO[0168] failed to apply the changes                   error="elasticsearches.logging.openshift.io \"elasticsearch\" already exists" instance=simple-prod namespace=myproject
ERRO[0169] failed to apply the changes                   error="elasticsearches.logging.openshift.io \"elasticsearch\" already exists" instance=simple-prod namespace=myproject
ERRO[0172] failed to apply the changes                   error="elasticsearches.logging.openshift.io \"elasticsearch\" already exists" instance=simple-prod namespace=myproject
ERRO[0177] failed to apply the changes                   error="elasticsearches.logging.openshift.io \"elasticsearch\" already exists" instance=simple-prod namespace=myproject
ERRO[0187] failed to apply the changes                   error="elasticsearches.logging.openshift.io \"elasticsearch\" already exists" instance=simple-prod namespace=myproject

Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>
@jpkrohling
Copy link
Contributor Author

The query and collector exit with 1 if there are no ES nodes available. Maybe you should wait longer?

Not sure: it does get into a failed state when the initial connection cannot be made, with logs like this:

$ kubectl logs simple-prod-collector-76c76f74fb-55mc2
{"level":"info","ts":1551274615.2542229,"caller":"healthcheck/handler.go:99","msg":"Health Check server started","http-port":14269,"status":"unavailable"}
{"level":"fatal","ts":1551274620.2866693,"caller":"collector/main.go:103","msg":"Failed to init storage factory","error":"failed to create primary Elasticsearch client: health check timeout: Head https://elasticsearch:9200: dial tcp: lookup elasticsearch on 172.30.0.2:53: server misbehaving: no Elasticsearch node available","errorVerbose":"no Elasticsearch node available\ngithub.com/jaegertracing/jaeger/vendor/gopkg.in/olivere/elastic%2ev5.init\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/vendor/gopkg.in/olivere/elastic.v5/client.go:88\ngithub.com/jaegertracing/jaeger/pkg/es.init\n\t<autogenerated>:1\ngithub.com/jaegertracing/jaeger/plugin/storage/es.init\n\t<autogenerated>:1\ngithub.com/jaegertracing/jaeger/plugin/storage.init\n\t<autogenerated>:1\ngithub.com/jaegertracing/jaeger/cmd/env.init\n\t<autogenerated>:1\nmain.init\n\t<autogenerated>:1\nruntime.main\n\t/home/travis/.gimme/versions/go1.11.1.linux.amd64/src/runtime/proc.go:189\nruntime.goexit\n\t/home/travis/.gimme/versions/go1.11.1.linux.amd64/src/runtime/asm_amd64.s:1333\nhealth check timeout: Head https://elasticsearch:9200: dial tcp: lookup elasticsearch on 172.30.0.2:53: server misbehaving\ngithub.com/jaegertracing/jaeger/vendor/gopkg.in/olivere/elastic%2ev5.(*Client).startupHealthcheck\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/vendor/gopkg.in/olivere/elastic.v5/client.go:1116\ngithub.com/jaegertracing/jaeger/vendor/gopkg.in/olivere/elastic%2ev5.NewClient\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/vendor/gopkg.in/olivere/elastic.v5/client.go:244\ngithub.com/jaegertracing/jaeger/pkg/es/config.(*Configuration).NewClient\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/pkg/es/config/config.go:97\ngithub.com/jaegertracing/jaeger/plugin/storage/es.(*Factory).Initialize\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/plugin/storage/es/factory.go:80\ngithub.com/jaegertracing/jaeger/plugin/storage.(*Factory).Initialize\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/plugin/storage/factory.go:90\nmain.main.func1\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/cmd/collector/main.go:102\ngithub.com/jaegertracing/jaeger/vendor/github.com/spf13/cobra.(*Command).execute\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/vendor/github.com/spf13/cobra/command.go:762\ngithub.com/jaegertracing/jaeger/vendor/github.com/spf13/cobra.(*Command).ExecuteC\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/vendor/github.com/spf13/cobra/command.go:852\ngithub.com/jaegertracing/jaeger/vendor/github.com/spf13/cobra.(*Command).Execute\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/vendor/github.com/spf13/cobra/command.go:800\nmain.main\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/cmd/collector/main.go:203\nruntime.main\n\t/home/travis/.gimme/versions/go1.11.1.linux.amd64/src/runtime/proc.go:201\nruntime.goexit\n\t/home/travis/.gimme/versions/go1.11.1.linux.amd64/src/runtime/asm_amd64.s:1333\nfailed to create primary Elasticsearch client\ngithub.com/jaegertracing/jaeger/plugin/storage/es.(*Factory).Initialize\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/plugin/storage/es/factory.go:82\ngithub.com/jaegertracing/jaeger/plugin/storage.(*Factory).Initialize\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/plugin/storage/factory.go:90\nmain.main.func1\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/cmd/collector/main.go:102\ngithub.com/jaegertracing/jaeger/vendor/github.com/spf13/cobra.(*Command).execute\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/vendor/github.com/spf13/cobra/command.go:762\ngithub.com/jaegertracing/jaeger/vendor/github.com/spf13/cobra.(*Command).ExecuteC\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/vendor/github.com/spf13/cobra/command.go:852\ngithub.com/jaegertracing/jaeger/vendor/github.com/spf13/cobra.(*Command).Execute\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/vendor/github.com/spf13/cobra/command.go:800\nmain.main\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/cmd/collector/main.go:203\nruntime.main\n\t/home/travis/.gimme/versions/go1.11.1.linux.amd64/src/runtime/proc.go:201\nruntime.goexit\n\t/home/travis/.gimme/versions/go1.11.1.linux.amd64/src/runtime/asm_amd64.s:1333","stacktrace":"main.main.func1\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/cmd/collector/main.go:103\ngithub.com/jaegertracing/jaeger/vendor/github.com/spf13/cobra.(*Command).execute\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/vendor/github.com/spf13/cobra/command.go:762\ngithub.com/jaegertracing/jaeger/vendor/github.com/spf13/cobra.(*Command).ExecuteC\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/vendor/github.com/spf13/cobra/command.go:852\ngithub.com/jaegertracing/jaeger/vendor/github.com/spf13/cobra.(*Command).Execute\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/vendor/github.com/spf13/cobra/command.go:800\nmain.main\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/cmd/collector/main.go:203\nruntime.main\n\t/home/travis/.gimme/versions/go1.11.1.linux.amd64/src/runtime/proc.go:201"}

The case I got into was later on (span_writer.go instead of main.go). So, ES was probably receiving connections, but not ready yet? Note that the log error was "Failed to create index" and not "no Elasticsearch node available".

@jpkrohling
Copy link
Contributor Author

By the way, the PR has been updated, to fix the error you reported before when doing kubectl edit jaeger simple-prod. With that, the update procedure continues. Looks like #235 was a side-effect, as it seems it works:

$ kubectl get cronjobs
NAME                             SCHEDULE      SUSPEND   ACTIVE    LAST SCHEDULE   AGE
simple-prod-es-index-cleaner     */5 * * * *   False     1         2m              4m
simple-prod-spark-dependencies   55 23 * * *   False     0         <none>          4m

@pavolloffay
Copy link
Member

thanks @jpkrohling for looking into this! The #235 seems to be a side effect

@jpkrohling jpkrohling merged commit 401ef74 into jaegertracing:master Feb 27, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants