Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modifying merged segment generated during merged rollup tasks. #13154

Open
piby180 opened this issue May 15, 2024 · 1 comment
Open

Modifying merged segment generated during merged rollup tasks. #13154

piby180 opened this issue May 15, 2024 · 1 comment

Comments

@piby180
Copy link
Contributor

piby180 commented May 15, 2024

We have a usecase where we need to update the data for offline tables due to various reasons. If we have 1:1 mapping between segments and the source parquet file, it is very easy. We modify the parquet file with same filename, create a new segment with same segment name and pinot update the segment inplace.

This is not possible if we enable merge rollup task for the offline table. There is no mapping maintained between merged segment and the source parquet files. And as far as I know, you can only create one segment from one file. There is no way to create one segment for multiple source parquet files.

Because of this, we are now forced to

  1. disable merge rollup tasks
  2. merge source parquet files on our own (e.g one monthly file for the whole month)
  3. create and push merged segment from the merged parquet file
  4. delete all the smaller segments manually (remove all segments for the month and retain only one merged monthly segment)

I would be curious to know if there is a way to do it with merge rollup tasks.

@Jackie-Jiang
Copy link
Contributor

cc @snleee @swaminathanmanish

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants