-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BugFix] fix column overflow when handle too large partial update #49054
Conversation
Signed-off-by: luohaha <18810541851@163.com>
Signed-off-by: luohaha <18810541851@163.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cloud native table also need this optimization?
Signed-off-by: luohaha <18810541851@163.com>
Quality Gate passedIssues Measures |
[FE Incremental Coverage Report]✅ pass : 0 / 0 (0%) |
[BE Incremental Coverage Report]✅ pass : 44 / 48 (91.67%) file detail
|
@Mergifyio backport branch-3.3 |
@Mergifyio backport branch-3.2 |
✅ Backports have been created
|
✅ Backports have been created
|
Why I'm doing:
In current implementation, when handle partial column update, we will try to build the column to be updated with segment granularity, it will lead to overflow. E.g.
ARRAY
column to be update, In the beginning, because the data in the table does not contain arrays yet, we will slice the segment file with a size of 1GB, so a segment file may contain a large number of rows. We can assume that there are 500w rows in a tablet.ArraryColumn
struct, we store offset of array using uint32_t , that means we can only have 4,294,967,295 items in aArraryColumn
.What I'm doing:
Processing updates involving large amounts of data in batches, each batch will be limit by
partial_update_memory_limit_per_worker
.What type of PR is this:
Does this PR entail a change in behavior?
If yes, please specify the type of change:
Checklist:
Bugfix cherry-pick branch check: