compact/downsampling: Build chunks directly in object storage (without using disk) #3406

bwplotka · 2020-11-04T15:33:30Z

It's totally doable now. Same with downsampling.

pracucci · 2020-11-04T15:50:39Z

Do you mean read source block chunks from object store, merge them on the fly and stream the output to the compacted block directly to the object store without touching disk?

bwplotka · 2020-11-05T17:31:42Z

Correct. Or at least touching some constant amount of disk. Exploring this right now.

bwplotka · 2020-11-05T17:35:26Z

The most complex part of it is that we need to extend API with PartUpload to do such streaming: https://docs.aws.amazon.com/AmazonS3/latest/API/API_UploadPart.html

Or we can add even use PartCopy to avoid touching those bytes directly.

bwplotka · 2020-11-05T17:39:43Z

Another limit for S3 is Each part must be at least 5 MB in size, except the last part. There is no size limit on the last part of your multipart upload.

For GCS we could use just https://cloud.google.com/storage/docs/resumable-uploads because https://cloud.google.com/storage/docs/json_api/v1/how-tos/multipart-upload is used for something else (metadata upload). On top of that: https://cloud.google.com/storage/docs/json_api/v1/objects/rewrite

bwplotka · 2020-11-05T17:45:08Z

Azure is very flexible for this: https://docs.microsoft.com/en-us/rest/api/storageservices/#blob-service

Damn it's a lot of work 😢

But the gain is trivial: We don't need disk (or a lot of disks) to write chunks into chunk files. The problem of the disk is not that urging though.

stale · 2021-01-04T20:00:27Z

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

stale · 2021-01-18T22:59:19Z

Closing for now as promised, let us know if you need this to be reopened! 🤗

GiedriusS · 2021-03-27T19:45:17Z

Not sure if I agree with the "trivial" part. In bigger deployments, the size of produced blocks can easily go into the range of hundreds of gigabytes. It would be very nice to have this so hence reopening.

GiedriusS · 2021-04-13T11:59:48Z

I actually had a troll-ish idea, it might be already feasible. What if we mounted, for example, an S3 bucket via s3fs (https://github.com/s3fs-fuse/s3fs-fuse) and then used the filesystem storage type for Thanos Compactor? In the compaction case, we download whole files so it's not an issue that Thanos is unaware of the underlying storage, right? Any thoughts @pracucci @bwplotka ?

bwplotka · 2021-04-15T14:32:29Z

That might work for a start! (: The problem with this approach is local disk data caching at the end might either end up downloading things on disk or make an additional number of calls unnecessarily. What we want is to have full control over those things.

I agree it's not trivial, but it's also not super difficult. Definitely high on priority list. Maybe good project for LFX summer? 🤗

stale · 2021-06-16T01:58:38Z

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

stale · 2021-06-30T02:38:17Z

Closing for now as promised, let us know if you need this to be reopened! 🤗

stale · 2021-09-03T02:37:40Z

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

stale · 2021-09-19T01:33:05Z

Closing for now as promised, let us know if you need this to be reopened! 🤗

stale · 2022-01-09T10:33:51Z

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

someshkoli · 2022-01-25T21:30:35Z

Was looking for solutions to pull -> transform-> upload very large file to and from obj storage (S3 in my case), thought ther would be some implementations here on thanos :P looks like we're yet to implement this.
Used part upload for my purpose and it works fine, will try to onboard the same here. Will do a poc for S3 and then we can move forward for other services too.

GiedriusS · 2022-01-28T11:16:05Z

Doing this for compaction is a bit trickier since Prometheus heavily relies on mmap. Downsampling is totally doable, though, since everything is under an interface.

vjabrayilov · 2022-04-09T22:55:08Z

Hey, I would like to resolve this issue and enhance compaction/downsampling. I have written a proposal for GSoC. It would be kind of you if you review and give feedback. @bwplotka @yeya24 @matej-g
https://drive.google.com/file/d/1oZd3ENSZ7v2hONNf4pDL3yXr_0eHcwsV/view?usp=sharing

stale · 2022-06-12T17:56:05Z

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

yeya24 · 2022-06-15T21:16:50Z

Still valid

oronsh · 2022-09-12T04:51:42Z

@yeya24 @bwplotka @GiedriusS Doesn't it mean that for 2 blocks having 1M series each and all series are the same, you'd have to have 2M requests for S3 copying series A from block A and series A from block B chunks to the new block C chunk file?

stale · 2022-11-13T15:04:13Z

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

bwplotka added feature request/improvement difficulty: medium help wanted labels Nov 4, 2020

bwplotka mentioned this issue Nov 4, 2020

compact: Redesign compaction planning process for cost efficiency and determinism. #3405

Open

stale bot added the stale label Jan 4, 2021

stale bot closed this as completed Jan 18, 2021

GiedriusS reopened this Mar 27, 2021

stale bot removed stale labels Mar 27, 2021

stale bot added the stale label Jun 16, 2021

stale bot closed this as completed Jun 30, 2021

GiedriusS reopened this Jun 30, 2021

stale bot removed the stale label Jun 30, 2021

stale bot added the stale label Sep 3, 2021

stale bot closed this as completed Sep 19, 2021

GiedriusS reopened this Sep 19, 2021

stale bot removed the stale label Sep 19, 2021

stale bot added the stale label Jan 9, 2022

GiedriusS removed the stale label Jan 10, 2022

stale bot added the stale label Jun 12, 2022

stale bot removed the stale label Jun 15, 2022

stale bot added the stale label Nov 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

compact/downsampling: Build chunks directly in object storage (without using disk) #3406

compact/downsampling: Build chunks directly in object storage (without using disk) #3406

bwplotka commented Nov 4, 2020

pracucci commented Nov 4, 2020

bwplotka commented Nov 5, 2020

bwplotka commented Nov 5, 2020 •

edited

Loading

bwplotka commented Nov 5, 2020 •

edited

Loading

bwplotka commented Nov 5, 2020

stale bot commented Jan 4, 2021

stale bot commented Jan 18, 2021

GiedriusS commented Mar 27, 2021

GiedriusS commented Apr 13, 2021 •

edited

Loading

bwplotka commented Apr 15, 2021

stale bot commented Jun 16, 2021

stale bot commented Jun 30, 2021

stale bot commented Sep 3, 2021

stale bot commented Sep 19, 2021

stale bot commented Jan 9, 2022

someshkoli commented Jan 25, 2022

GiedriusS commented Jan 28, 2022

vjabrayilov commented Apr 9, 2022

stale bot commented Jun 12, 2022

yeya24 commented Jun 15, 2022

oronsh commented Sep 12, 2022

stale bot commented Nov 13, 2022

compact/downsampling: Build chunks directly in object storage (without using disk) #3406

compact/downsampling: Build chunks directly in object storage (without using disk) #3406

Comments

bwplotka commented Nov 4, 2020

pracucci commented Nov 4, 2020

bwplotka commented Nov 5, 2020

bwplotka commented Nov 5, 2020 • edited Loading

bwplotka commented Nov 5, 2020 • edited Loading

bwplotka commented Nov 5, 2020

stale bot commented Jan 4, 2021

stale bot commented Jan 18, 2021

GiedriusS commented Mar 27, 2021

GiedriusS commented Apr 13, 2021 • edited Loading

bwplotka commented Apr 15, 2021

stale bot commented Jun 16, 2021

stale bot commented Jun 30, 2021

stale bot commented Sep 3, 2021

stale bot commented Sep 19, 2021

stale bot commented Jan 9, 2022

someshkoli commented Jan 25, 2022

GiedriusS commented Jan 28, 2022

vjabrayilov commented Apr 9, 2022

stale bot commented Jun 12, 2022

yeya24 commented Jun 15, 2022

oronsh commented Sep 12, 2022

stale bot commented Nov 13, 2022

bwplotka commented Nov 5, 2020 •

edited

Loading

bwplotka commented Nov 5, 2020 •

edited

Loading

GiedriusS commented Apr 13, 2021 •

edited

Loading