Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to run MergeRollupTask for specific days and/or segments #14138

Open
saifat29 opened this issue Oct 1, 2024 · 2 comments
Open

Allow to run MergeRollupTask for specific days and/or segments #14138

saifat29 opened this issue Oct 1, 2024 · 2 comments

Comments

@saifat29
Copy link

saifat29 commented Oct 1, 2024

Currently MergeRollupTask runs in a forward manner, meaning segments that were processed are not processed again even after modifying the watermark in Zookeeper.

Possible use case is that if ingestion of data is not uniform, some days receive more events than others, so rollup for days where events are less, results in let's say 10 segments, but for days where events are much higher, segments are rolled up into 1000 segments.

If MergeRollupTask can be made to run again with new configuration for those affected days it would be ideal.

Druid has this really useful feature called Reindex which does this.

@Jackie-Jiang
Copy link
Contributor

cc @swaminathanmanish

@saifat29
Copy link
Author

saifat29 commented Oct 8, 2024

This feature exists in star tree cloud as a Pinot extension and is called SegmentRefreshTask

https://dev.startree.ai/docs/manage-data/segment-refresh-task

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants