Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

query: Deduplication latency #2222

Closed
yineima opened this issue Mar 6, 2020 · 10 comments
Closed

query: Deduplication latency #2222

yineima opened this issue Mar 6, 2020 · 10 comments

Comments

@yineima
Copy link

yineima commented Mar 6, 2020

thanos version
v0.9.0

prometheus version
version 2.14.0

Object Storage Provider:
no use of object store,use local disk

What happened:
After useing query.replica-label for HA, lately I found queries consume much more time than before
deduplication makes query slow
same query,not query range
deduplication cost 23s, however no deduplication only cost 293ms.
this is a huge cost for HA

here is my Screenshot
image

the query is sum(rate(appnews_importable_bf_request_num{job="import_interface",product="Inews"} [1m] offset 24h))

appnews_importable_bf_request_num{job="import_interface",product="Inews"} offset 24h has about 1200 samples

image

I wonder Does it make sense?
or Is there any wrong with my usage?

What you expected to happen:
not too much worse proformace than before

hope you could reply this issue ASAP
this issus makes me crazy.
Thank you.

@bwplotka
Copy link
Member

bwplotka commented Mar 6, 2020

Thanks for this report.

So normally the overhead is unnoticeable, so there might be some bug there for your case here. It would be nice to dive deeper. What I find really suspicious is that appnews_importable_bf_request_num{job="import_interface",product="Inews"} offset 24h takes 322ms WITH Deduplication, but sum(rate(appnews_importable_bf_request_num{job="import_interface",product="Inews"} [1m] offset 24h) is suddenly 24s? That's kind of unlikely.

Let's dive deeper:

  1. You say 1200 samples. But this is for instant query without range selector [1m]. Can we do instant query on console for appnews_importable_bf_request_num{job="import_interface",product="Inews"}[1m] offset 24h? If you have 1200 series I expect it to be something like 4x1200 samples.
  2. Can you check latency for appnews_importable_bf_request_num{job="import_interface",product="Inews"}[1m] offset 24h with and without deduplictation?

@bwplotka
Copy link
Member

bwplotka commented Mar 6, 2020

Also, Try without offset, but I think the issue that we are selecting all 24h+1m was fixed in v0.9.0 already.

@bwplotka bwplotka changed the title deduplication makes query slow query: Deduplication latency Mar 6, 2020
@yineima
Copy link
Author

yineima commented Mar 9, 2020

Thanks for this report.

So normally the overhead is unnoticeable, so there might be some bug there for your case here. It would be nice to dive deeper. What I find really suspicious is that appnews_importable_bf_request_num{job="import_interface",product="Inews"} offset 24h takes 322ms WITH Deduplication, but sum(rate(appnews_importable_bf_request_num{job="import_interface",product="Inews"} [1m] offset 24h) is suddenly 24s? That's kind of unlikely.

Let's dive deeper:

  1. You say 1200 samples. But this is for instant query without range selector [1m]. Can we do instant query on console for appnews_importable_bf_request_num{job="import_interface",product="Inews"}[1m] offset 24h? If you have 1200 series I expect it to be something like 4x1200 samples.
  2. Can you check latency for appnews_importable_bf_request_num{job="import_interface",product="Inews"}[1m] offset 24h with and without deduplictation?

Thank you for your quick reply.

  1. with range selector [1m]. one serie could have 4 or 5 samples.
    So when I do instant query on console for appnews_importable_bf_request_num{job="import_interface",product="Inews"}[1m] offset 24h with deduplication, I got this
    image

2.this is the latency comparsion for appnews_importable_bf_request_num{job="import_interface",product="Inews"}[1m] offset 24h

image

3.this is the latency comparsion for appnews_importable_bf_request_num{job="import_interface",product="Inews"}[1m]

image

Thank you!

@yineima
Copy link
Author

yineima commented Mar 10, 2020

Does it may have some connection with query traffic?
I have set --query.timeout 20m --query.max-concurrent 200 for query
but I've not found any config of concurrent for sidecar

@bwplotka
Copy link
Member

I mean try to not use the sidecar for anything else for period of tests (:

@daixiang0
Copy link
Member

Do we have a tool or something to do pre-deduplication? If not, would be implemented it in compactor?

@bwplotka
Copy link
Member

bwplotka commented Mar 20, 2020 via email

@stale
Copy link

stale bot commented Apr 19, 2020

This issue/PR has been automatically marked as stale because it has not had recent activity. Please comment on status otherwise the issue will be closed in a week. Thank you for your contributions.

@stale stale bot added the stale label Apr 19, 2020
@stale
Copy link

stale bot commented Apr 26, 2020

Closing for now as promised, let us know if you need this to be reopened! 🤗

@stale stale bot closed this as completed Apr 26, 2020
@PowerSurj
Copy link

This seems still an issue. Deduplication adds significant latencies, particularly visible when queries have big number of series.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants