Compact: Offline deduplication #1014

smalldirector · 2019-04-07T00:31:16Z

Currently Thanos provides dedup function in Query component, it can merge the data on the fly from Prometheus HA pairs or remote object storage.
However, the Query dedup function is not query efficiently, because the replica's blocks in tsdb/object storage are not really merged so that the Query API has to load duplicated data in each request.

With current Prometheus TSDB design, it seems difficult to implement dedup function to merge blocks in each TSDB node on Thanos side.
However, it should be easily to implement it for the object storage as it is one centralized storage, and also we already have one Compactor component running on it.

Offering dedup function on object storage side will definitely help fast metrics query latency and reduce the object storage cost.

Do we have any plan to support such requirements in Thanos? If had, I would like to know your ideas about this feature.

Thanks.

bwplotka · 2019-04-07T09:59:07Z

Hi 👋 thanks for raising this. I rename title to make it more clear, let me know if this makes sense.

There was always idea like this to allow compactor to deduplicate blocks offline and reduce the storage and bit the query performance. I think I am happy to do so however, we need ourselves question, is the curret deduplication algorithm a correct one for everybody?

We have seen some reports claiming that our "penalty-based" deduplication algorithm is not working for all edge cases e.g: #981 It's fair as our alghorithm is very basic. The problem with this feature and not perfect algorithm is that we will deduplicate data irreversibily (unless we back up those blocks somehow).

Anyway, I think we are happy to add such feature to compactor at some point, given we can solve the gaps, for current algorithm (and add more tests/ test cases maybe).

smalldirector · 2019-04-07T19:37:10Z

Thanks for the reply @bwplotka. I'm glad to hear that you also want to support this feature in the future.
I'm working in eBay's monitoring infrastructure team, and we are leveraging Thanos to build eBay's high availability monitoring platform. Thanks for providing such great framework.

For the offline dedup function, we have already started to build it inside the Thanos compactor component. If needed, we are more than happy to contribute it back to Thanos community. Please feel free to let me know your thoughts as well.

bwplotka · 2019-04-07T19:51:20Z

Of course! PRs would be welcome, especially if you have something proven from your prod (:

bwplotka · 2019-04-07T19:51:46Z

Are you on our slack BTW? (:

smalldirector · 2019-04-07T22:15:11Z

Which slack channel are your pointing here? I am only be able to see the announcements channel. I'm not able to join the thanos-dev channel which mentioned here: https://github.com/improbable-eng/thanos/blob/master/CONTRIBUTING.md. BTW, my display name is smalldirector in announcements channel.

bwplotka · 2019-05-07T11:04:47Z

I can see you in all #thanos #thanos-dev channels (:

stale · 2020-01-11T06:42:51Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Reamer · 2020-01-13T10:59:43Z

Remove "stale", that's an interesting feature

stale · 2020-02-27T11:14:50Z

This issue/PR has been automatically marked as stale because it has not had recent activity. Please comment on status otherwise the issue will be closed in a week. Thank you for your contributions.

bwplotka · 2020-02-27T11:50:13Z

Stale bot, go away for now ;p We are quite close! @metalmatze is working on it and I am refactoring TSDB to allow us to do it via TSDB code.

…

On Thu, 27 Feb 2020 at 11:14, stale[bot] ***@***.***> wrote: This issue/PR has been automatically marked as stale because it has not had recent activity. Please comment on status otherwise the issue will be closed in a week. Thank you for your contributions. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1014?email_source=notifications&email_token=ABVA3OYQGYMEUPXR4UEUVETRE6OCZA5CNFSM4HEB6SW2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEND65VY#issuecomment-591916759>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABVA3OZ6KTITQNE6K3DWWMDRE6OCZANCNFSM4HEB6SWQ> .

stale · 2020-03-28T12:46:26Z

This issue/PR has been automatically marked as stale because it has not had recent activity. Please comment on status otherwise the issue will be closed in a week. Thank you for your contributions.

ThoreKr · 2020-03-28T14:04:08Z

Can someone add the label for stalebot to ignore this issue?

bwplotka · 2020-07-20T15:05:05Z

👋

We are adding new improvements to runtime deduplication so we can safely use it: #2890 This time, improving penalty algorithm. There might be one tricky part about counter reset. In Prometheus there is not metric type information yet, so we are discussing online how we can add that thanks to recent metadata changes to block (cc @pracucci @brian-brazil). If not we will need to guess it from data which needs more testing.

In the mean time we could potentially allow deduplication with backup, so you still back up blocks in some remote location (another cold storage bucket), so we can revert things if needed. Without back up I would not be confident to allow offline dedup for our Thanos users right now, before we handle those missing bits (:

brian-brazil · 2020-07-20T16:38:43Z

There's some initial discussions about type metadata for remote write via the WAL, getting them into blocks is a completely different kettle of fish.

bwplotka · 2020-07-20T17:17:42Z

Also Chunk iterator work is almost done which will be needed for dedup during compaction.

Yea agree, super unlikely for now.

stale · 2020-08-19T17:37:49Z

Hello 👋 Looks like there was no activity on this issue for last 30 days.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity for next week, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

Reamer · 2020-08-19T19:26:14Z

Still important

eightnoteight · 2020-09-03T14:04:06Z

Hi @bwplotka , interested in this issue, is there anyway I can contribute on this ?

Antiarchitect · 2020-09-28T10:06:45Z

Also interested very much. Please update the status.

stale · 2020-11-27T19:02:23Z

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

Reamer · 2020-11-30T14:30:06Z

Still interested.

bwplotka · 2020-12-16T10:34:34Z

👋🏽

Recently got many questions on DM, please ask them here around offline dedup (: Some update:

We added more docs hopefully cleaning WHY and HOW: https://thanos.io/tip/components/compact.md/#vertical-compactions

Answering one DM:

I saw on the compactor docs that the compactor supports Vertical Compactions via a hidden flag. Our use-case for wanting to implement that is covered under the use-cases listed right within that doc, specifically as the realistic duplication under the Offline deduplication of series bullet point. I’m a bit confused though. Is this something we can use now using the flags:
--compact.enable-vertical-compaction
--deduplication.replica-label="prometheus_replica"
(The prometheus_replica external label is what we use to distinguish between our each Prometheus within a Prometheus HA pair)
It looks like according to the docs that should be okay, however, I also found this issue you’ve been pretty active on:

It's NOT ok, unfortunately. Vertical compaction is implemented for one-to-one duplications, which comes from https://thanos.io/tip/components/receive.md/ deduplication OR when you have for some reason duplicated block with exactly same data.

If you use this against Prometheus replicas it will most likely totally mess your querying experience as it just concatenates samples together. So scrape interval in best case is 2x of original, in the worst case it's totally unstable.

The missing part is adding the deduplication algorithm we have online on the query to the compaction stage so we can leverage that. This algorithm also works on 99% of cases which is fine for the query part, when you can just switch deduplication off. If this 1% happens on offline dedup, then you cannot revert this, that's a problem.

We are exploring different deduplication algorithms that will make this much more reliable. We also need something ideally for Query pushdown: https://docs.google.com/document/d/1ajMPwVJYnedvQ1uJNW2GDBY6Q91rKAw3kgA5fykIRrg/edit# so ... help wanted (:

We can try to enable 99% realistic dedup if we want, help wanted for this 🤗

heyitsmdr · 2020-12-16T15:09:50Z

Thank you @bwplotka for answering that question and confirming my suspicions. I'm going to look at the doc and related work and see if this is something I can help out with :) We would greatly benefit from a feature like this to dedupe the duplicated data coming from each Prometheus in the HA pair.

stale · 2021-02-14T19:43:45Z

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

Reamer · 2021-02-15T07:58:59Z

This issue is still needed.
@bwplotka it would be nice if Thanos would implement the 99% solution.

stale · 2021-04-18T22:57:08Z

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

dpwolfe · 2021-05-01T05:22:29Z

Please keep this open.

2nick · 2021-05-11T05:47:44Z

I took a look at the querier's deduplication and compaction process and have some questions. :)

For example, we have "main" and "replica" instances of Prometheus.

They could start asynchronously and as a result, it's possible to have next blocks in s3:
main: block1[1am - 3am], block2[3am - 5am]
replica: block3[2am - 5am]

And of course, such overlapping blocks could be much more.

So the first question is - how should work planner and compactor?

The only option I see is that planner makes 2 groups like:

block1+block3
block2+block3

To take from block3 only parts which complement "main" blocks.

But such behavior sound to be a bit sophisticated - maybe there are better options/thoughts about this thing? :)

yeya24 · 2021-05-16T16:37:06Z

@2nick I am looking at the same thing. It is a little bit sophisticated and I think the grouping approach in #1276 is similar to what you mentioned.

IMO the grouping part of offline deduplication can reuse the existing tsdb planner. We don't need to group into 2 groups in this case. We can do it in 2 iterations:

block1 + block3 -> new block [1am - 5am]
new block + block2 -> result [1am - 5am]

WDYT? But I agree this is still a not very good approach. It would cost a lot if the overlap is small, but we still need to iterate the whole block to compact.

2nick · 2021-05-17T10:43:06Z

After some thinking I've decided that it's "good enough" approach as it allows to move forward. :)

Really great that you are in it! Looking forward to offline dedup! :)

bwplotka added component: compact difficulty: medium feature request/improvement help wanted labels Apr 7, 2019

bwplotka changed the title ~~Compact: Built-in dedup function for Object Storage~~ Compact: Offline deduplication Apr 7, 2019

This was referenced Jun 24, 2019

Compact: Offline deduplication #1275

Closed

Compact: Offline deduplication #1276

Closed

stale bot added the stale label Jan 11, 2020

stale bot removed the stale label Jan 13, 2020

brancz added the pinned label Jan 13, 2020

bwplotka removed the pinned label Jan 28, 2020

brancz mentioned this issue Feb 10, 2020

compact: deal with thanos-receive HA blobs #2116

Closed

stale bot added the stale label Feb 27, 2020

stale bot removed the stale label Feb 27, 2020

This was referenced Mar 20, 2020

query: Deduplication latency #2222

Closed

storage: Added Chunks{Queryable/Querier/SeriesSet/Series/Iteratable. Added generic Merge{SeriesSet/Querier} implementation. prometheus/prometheus#7005

Merged

stale bot added the stale label Mar 28, 2020

stale bot added the stale label Aug 19, 2020

stale bot removed the stale label Aug 19, 2020

stale bot added the stale label Nov 27, 2020

bwplotka removed the stale label Nov 30, 2020

stale bot added the stale label Feb 14, 2021

stale bot removed the stale label Feb 15, 2021

stale bot added the stale label Apr 18, 2021

stale bot removed the stale label May 1, 2021

This was referenced May 16, 2021

Allow compact series merger to be configurable prometheus/prometheus#8836

Merged

compactor: Add offline deduplication #4239

Merged

yeya24 closed this as completed in #4239 Jun 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compact: Offline deduplication #1014

Compact: Offline deduplication #1014

smalldirector commented Apr 7, 2019

bwplotka commented Apr 7, 2019

smalldirector commented Apr 7, 2019

bwplotka commented Apr 7, 2019

bwplotka commented Apr 7, 2019 •

edited

Loading

smalldirector commented Apr 7, 2019 •

edited

Loading

bwplotka commented May 7, 2019

stale bot commented Jan 11, 2020

Reamer commented Jan 13, 2020

stale bot commented Feb 27, 2020

bwplotka commented Feb 27, 2020 via email

stale bot commented Mar 28, 2020

ThoreKr commented Mar 28, 2020

bwplotka commented Jul 20, 2020 •

edited

Loading

brian-brazil commented Jul 20, 2020

bwplotka commented Jul 20, 2020

stale bot commented Aug 19, 2020

Reamer commented Aug 19, 2020

eightnoteight commented Sep 3, 2020

Antiarchitect commented Sep 28, 2020

stale bot commented Nov 27, 2020

Reamer commented Nov 30, 2020

bwplotka commented Dec 16, 2020

heyitsmdr commented Dec 16, 2020

stale bot commented Feb 14, 2021

Reamer commented Feb 15, 2021

stale bot commented Apr 18, 2021

dpwolfe commented May 1, 2021

2nick commented May 11, 2021

yeya24 commented May 16, 2021 •

edited

Loading

2nick commented May 17, 2021

Compact: Offline deduplication #1014

Compact: Offline deduplication #1014

Comments

smalldirector commented Apr 7, 2019

bwplotka commented Apr 7, 2019

smalldirector commented Apr 7, 2019

bwplotka commented Apr 7, 2019

bwplotka commented Apr 7, 2019 • edited Loading

smalldirector commented Apr 7, 2019 • edited Loading

bwplotka commented May 7, 2019

stale bot commented Jan 11, 2020

Reamer commented Jan 13, 2020

stale bot commented Feb 27, 2020

bwplotka commented Feb 27, 2020 via email

stale bot commented Mar 28, 2020

ThoreKr commented Mar 28, 2020

bwplotka commented Jul 20, 2020 • edited Loading

brian-brazil commented Jul 20, 2020

bwplotka commented Jul 20, 2020

stale bot commented Aug 19, 2020

Reamer commented Aug 19, 2020

eightnoteight commented Sep 3, 2020

Antiarchitect commented Sep 28, 2020

stale bot commented Nov 27, 2020

Reamer commented Nov 30, 2020

bwplotka commented Dec 16, 2020

heyitsmdr commented Dec 16, 2020

stale bot commented Feb 14, 2021

Reamer commented Feb 15, 2021

stale bot commented Apr 18, 2021

dpwolfe commented May 1, 2021

2nick commented May 11, 2021

yeya24 commented May 16, 2021 • edited Loading

2nick commented May 17, 2021

bwplotka commented Apr 7, 2019 •

edited

Loading

smalldirector commented Apr 7, 2019 •

edited

Loading

bwplotka commented Jul 20, 2020 •

edited

Loading

yeya24 commented May 16, 2021 •

edited

Loading