Query does not log undiscoverable store #2404

belm0 · 2020-04-09T04:20:37Z

Thanos, Prometheus and Golang version used: v0.11.0
Object Storage Provider: GCS

What happened:
Specified Store and Sidecar storage endpoints on the command command line using DNS resolver. The Sidecar endpoint was incorrect. Query logged the addition of the Store endpoint, but nothing about the Sidecar endpoint. The Query /stores page shows only the Store endpoint.

        --store.sd-dns-resolver=miekgdns
        --store=dnssrv+_grpc._tcp.thanos-store-grpc.default.svc.cluster.local
        --store=dnssrv+_grpc._tcp.thanos-sidecar-grpc.default.svc.cluster.local

What you expected to happen:
Query logs an error regarding DNS resolution of the bad endpoint. The /stores page provides information about the unresolved endpoint.

The text was updated successfully, but these errors were encountered:

bwplotka · 2020-04-09T07:43:33Z

Agree, good point. 👍 Marking as bug, help wanted to fix it 🤗

yashrsharma44 · 2020-04-20T09:54:29Z

Shall I go about solving this issue?

bwplotka · 2020-04-20T10:16:55Z

yes please!

…

On Mon, 20 Apr 2020 at 10:54, Yash Sharma ***@***.***> wrote: Shall I go about solving this issue? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#2404 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABVA3O3WDKHRVEZSODKFONDRNQLWJANCNFSM4MENVUVA> .

yashrsharma44 · 2020-04-20T10:35:47Z

Great! I will hop on! 😛

yashrsharma44 · 2020-04-25T04:39:01Z

Hey @bwplotka! I was going through the issue, and after reproducing it, I observed the following -

The DNS resolution raises an error(here) where it seems to log the error and move on with other dns resolution.
I was thinking of passing the error to this function where if it detects an error, it logs an error from the Query component.
What do you think about the approach?

bwplotka · 2020-04-25T09:06:30Z

Nice, so if it already logs an error.. what is the problem we are trying to solve then? (:

yashrsharma44 · 2020-04-25T09:09:13Z

So the error raised was from the resolver.go file, where it does the resolution of dns, but somehow that error is not propagated to the query component. So I think we need to propagate the error, as I didn't see any errors raised in the logs of query component.

bwplotka · 2020-04-25T09:10:23Z

What do you mean no propagate? There is literally level.Error(p.logger).Log("msg", "dns resolution failed", "addr", addr, "err", err) log line 🤔

yashrsharma44 · 2020-04-25T09:12:05Z

Yeah, when I ran the query component in my local machine, it did raise the error, but the same does not happen when I check the logs of query component in Kubernetes deployment.

Let me attach the log.

bwplotka · 2020-04-25T09:20:02Z

Nice! Maybe logger is not passed properly?

yashrsharma44 · 2020-04-25T09:39:54Z

I am attaching some info about the investigation that I did 😛

Config details passed to thanos query

thanos query \
--grpc-address=0.0.0.0:10901 \
--http-address=0.0.0.0:9090 \
--query.replica-label=prometheus_replica \
--query.replica-label=rule_replica \
--store.sd-dns-resolver=miekgdns \
--store=dnssrv+_grpc._tcp.thanos-store.thanos.svc.cluster.local \
--store=dnssrv+_grpc._tcp.prometheus-3-service.monitoring.svc.cluster.local \

Here is the log -


level=info ts=2020-04-25T09:31:55.41307044Z caller=main.go:152 msg="Tracing will be disabled"
level=info ts=2020-04-25T09:31:55.457862986Z caller=options.go:23 protocol=gRPC msg="disabled TLS, key and cert must be set to enable"
level=info ts=2020-04-25T09:31:55.458943584Z caller=query.go:401 msg="starting query node"
level=info ts=2020-04-25T09:31:55.45948314Z caller=intrumentation.go:48 msg="changing probe status" status=ready
level=info ts=2020-04-25T09:31:55.460033534Z caller=intrumentation.go:60 msg="changing probe status" status=healthy
level=info ts=2020-04-25T09:31:55.460226236Z caller=http.go:56 service=http/server component=query msg="listening for requests and metrics" address=0.0.0.0:9090
level=info ts=2020-04-25T09:31:55.460227986Z caller=grpc.go:106 service=gRPC/server component=query msg="listening for StoreAPI gRPC" address=0.0.0.0:10901
level=info ts=2020-04-25T09:33:25.58191222Z caller=storeset.go:384 component=storeset msg="adding new storeAPI to query storeset" address=172.17.0.13:10901 extLset=

And here is the pods that I have deployed in minikube -

 yash@kmaster  kube-prome/prome-thanos  sudo kubectl get po --all-namespaces
NAMESPACE              NAME                                         READY   STATUS    RESTARTS   AGE
kube-system            coredns-66bff467f8-8zw8l                     1/1     Running   14         18h
kube-system            coredns-66bff467f8-sh9v9                     1/1     Running   10         18h
kube-system            etcd-kmaster                                 1/1     Running   5          18h
kube-system            kube-apiserver-kmaster                       1/1     Running   6          18h
kube-system            kube-controller-manager-kmaster              1/1     Running   3          7h26m
kube-system            kube-proxy-8zcsp                             1/1     Running   2          18h
kube-system            kube-scheduler-kmaster                       1/1     Running   4          7h26m
kube-system            storage-provisioner                          1/1     Running   3          18h
kubernetes-dashboard   dashboard-metrics-scraper-84bfdf55ff-8268b   1/1     Running   2          18h
kubernetes-dashboard   kubernetes-dashboard-bc446cc64-cqdsn         1/1     Running   7          18h
monitoring             alertmanager-5f7f948969-jvgbb                1/1     Running   1          7h12m
monitoring             minio-2-7d5765f59c-56299                     1/1     Running   1          7h13m
monitoring             prometheus-0                                 2/2     Running   3          7h11m
monitoring             prometheus-1                                 2/2     Running   3          7h8m
monitoring             prometheus-2                                 2/2     Running   3          7h6m
thanos                 minio-85fd55b9fd-6t2wp                       1/1     Running   0          50s
thanos                 thanos-query-77d797f89d-vj4v8                1/1     Running   0          33s
thanos                 thanos-store-0                               0/1     Running   1          30s

As we can see that prometheus-3-service is not present, the query somehow skips the error.

yashrsharma44 · 2020-04-25T09:40:53Z

Maybe logger is not passed properly?

I think that might be the reason, I am reading through the codebase now, and would comment my understanding of the possible issue 😅

stale · 2020-05-25T10:15:19Z

Hello 👋 Looks like there was no activity on this issue for last 30 days.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity for next week, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

stale · 2020-06-01T11:22:23Z

Closing for now as promised, let us know if you need this to be reopened! 🤗

bwplotka added bug component: query difficulty: easy help wanted labels Apr 9, 2020

yashrsharma44 mentioned this issue Apr 26, 2020

Added logging for dns resolution error in the Query component as well #2525

Merged

2 tasks

stale bot added the stale label May 25, 2020

stale bot closed this as completed Jun 1, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Query does not log undiscoverable store #2404

Query does not log undiscoverable store #2404

belm0 commented Apr 9, 2020

bwplotka commented Apr 9, 2020

yashrsharma44 commented Apr 20, 2020

bwplotka commented Apr 20, 2020 via email

yashrsharma44 commented Apr 20, 2020

yashrsharma44 commented Apr 25, 2020

bwplotka commented Apr 25, 2020

yashrsharma44 commented Apr 25, 2020

bwplotka commented Apr 25, 2020

yashrsharma44 commented Apr 25, 2020 •

edited

Loading

bwplotka commented Apr 25, 2020

yashrsharma44 commented Apr 25, 2020

yashrsharma44 commented Apr 25, 2020

stale bot commented May 25, 2020

stale bot commented Jun 1, 2020

Query does not log undiscoverable store #2404

Query does not log undiscoverable store #2404

Comments

belm0 commented Apr 9, 2020

bwplotka commented Apr 9, 2020

yashrsharma44 commented Apr 20, 2020

bwplotka commented Apr 20, 2020 via email

yashrsharma44 commented Apr 20, 2020

yashrsharma44 commented Apr 25, 2020

bwplotka commented Apr 25, 2020

yashrsharma44 commented Apr 25, 2020

bwplotka commented Apr 25, 2020

yashrsharma44 commented Apr 25, 2020 • edited Loading

bwplotka commented Apr 25, 2020

yashrsharma44 commented Apr 25, 2020

yashrsharma44 commented Apr 25, 2020

stale bot commented May 25, 2020

stale bot commented Jun 1, 2020

yashrsharma44 commented Apr 25, 2020 •

edited

Loading