-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
compact/downsampling: Build chunks directly in object storage (without using disk) #3406
Comments
Do you mean read source block chunks from object store, merge them on the fly and stream the output to the compacted block directly to the object store without touching disk? |
Correct. Or at least touching some constant amount of disk. Exploring this right now. |
The most complex part of it is that we need to extend API with PartUpload to do such streaming: https://docs.aws.amazon.com/AmazonS3/latest/API/API_UploadPart.html Or we can add even use PartCopy to avoid touching those bytes directly. |
Another limit for S3 is For GCS we could use just https://cloud.google.com/storage/docs/resumable-uploads because https://cloud.google.com/storage/docs/json_api/v1/how-tos/multipart-upload is used for something else (metadata upload). On top of that: https://cloud.google.com/storage/docs/json_api/v1/objects/rewrite |
Azure is very flexible for this: https://docs.microsoft.com/en-us/rest/api/storageservices/#blob-service Damn it's a lot of work 😢 But the gain is trivial: We don't need disk (or a lot of disks) to write chunks into chunk files. The problem of the disk is not that urging though. |
Hello 👋 Looks like there was no activity on this issue for the last two months. |
Closing for now as promised, let us know if you need this to be reopened! 🤗 |
Not sure if I agree with the "trivial" part. In bigger deployments, the size of produced blocks can easily go into the range of hundreds of gigabytes. It would be very nice to have this so hence reopening. |
I actually had a troll-ish idea, it might be already feasible. What if we mounted, for example, an S3 bucket via s3fs (https://github.com/s3fs-fuse/s3fs-fuse) and then used the |
That might work for a start! (: The problem with this approach is I agree it's not trivial, but it's also not super difficult. Definitely high on priority list. Maybe good project for LFX summer? 🤗 |
Hello 👋 Looks like there was no activity on this issue for the last two months. |
Closing for now as promised, let us know if you need this to be reopened! 🤗 |
Hello 👋 Looks like there was no activity on this issue for the last two months. |
Closing for now as promised, let us know if you need this to be reopened! 🤗 |
Hello 👋 Looks like there was no activity on this issue for the last two months. |
Was looking for solutions to pull -> transform-> upload very large file to and from obj storage (S3 in my case), thought ther would be some implementations here on thanos :P looks like we're yet to implement this. |
Doing this for compaction is a bit trickier since Prometheus heavily relies on mmap. Downsampling is totally doable, though, since everything is under an |
Hey, I would like to resolve this issue and enhance compaction/downsampling. I have written a proposal for GSoC. It would be kind of you if you review and give feedback. @bwplotka @yeya24 @matej-g |
Hello 👋 Looks like there was no activity on this issue for the last two months. |
Still valid |
@yeya24 @bwplotka @GiedriusS Doesn't it mean that for 2 blocks having 1M series each and all series are the same, you'd have to have 2M requests for S3 copying series A from block A and series A from block B chunks to the new block C chunk file? |
Hello 👋 Looks like there was no activity on this issue for the last two months. |
It's totally doable now. Same with downsampling.
The text was updated successfully, but these errors were encountered: