Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] fix column overflow when handle too large partial update #49054

Merged
merged 3 commits into from
Aug 7, 2024

Conversation

luohaha
Copy link
Contributor

@luohaha luohaha commented Jul 29, 2024

Why I'm doing:

In current implementation, when handle partial column update, we will try to build the column to be updated with segment granularity, it will lead to overflow. E.g.

  1. If we have a ARRAY column to be update, In the beginning, because the data in the table does not contain arrays yet, we will slice the segment file with a size of 1GB, so a segment file may contain a large number of rows. We can assume that there are 500w rows in a tablet.
  2. And in a ArraryColumn struct, we store offset of array using uint32_t , that means we can only have 4,294,967,295 items in a ArraryColumn.
  3. When items in one array is larger than 900, which is large than 4,294,967,295, so overflow happens.

What I'm doing:

Processing updates involving large amounts of data in batches, each batch will be limit by partial_update_memory_limit_per_worker.

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • Parameter changes: default values, similar parameters but with different default values
  • Policy changes: use new policy to replace old one, functionality automatically enabled
  • Feature removed
  • Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function
  • This is a backport pr

Bugfix cherry-pick branch check:

  • I have checked the version labels which the pr will be auto-backported to the target branch
    • 3.3
    • 3.2
    • 3.1
    • 3.0
    • 2.5

Signed-off-by: luohaha <18810541851@163.com>
@wanpengfei-git wanpengfei-git requested a review from a team July 29, 2024 06:15
Signed-off-by: luohaha <18810541851@163.com>
decster
decster previously approved these changes Aug 2, 2024
chaoyli
chaoyli previously approved these changes Aug 5, 2024
Copy link
Contributor

@wyb wyb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cloud native table also need this optimization?

be/src/storage/rowset/horizontal_update_rowset_writer.cpp Outdated Show resolved Hide resolved
Signed-off-by: luohaha <18810541851@163.com>
@luohaha luohaha dismissed stale reviews from chaoyli and decster via fcae123 August 7, 2024 00:16
Copy link

sonarqubecloud bot commented Aug 7, 2024

Copy link

github-actions bot commented Aug 7, 2024

[FE Incremental Coverage Report]

pass : 0 / 0 (0%)

Copy link

github-actions bot commented Aug 7, 2024

[BE Incremental Coverage Report]

pass : 44 / 48 (91.67%)

file detail

path covered_line new_line coverage not_covered_line_detail
🔵 be/src/storage/rowset/horizontal_update_rowset_writer.cpp 0 4 00.00% [65, 66, 67, 68]
🔵 be/src/storage/rowset_column_update_state.cpp 39 39 100.00% []
🔵 be/src/storage/rowset_column_update_state.h 1 1 100.00% []
🔵 be/src/storage/rowset/rowset_writer.cpp 1 1 100.00% []
🔵 be/src/storage/rowset/rowset.h 2 2 100.00% []
🔵 be/src/storage/rowset/rowset_meta.h 1 1 100.00% []

@dirtysalt dirtysalt merged commit 3b682c7 into StarRocks:main Aug 7, 2024
54 of 58 checks passed
Copy link

github-actions bot commented Aug 7, 2024

@Mergifyio backport branch-3.3

@github-actions github-actions bot removed the 3.3 label Aug 7, 2024
Copy link

github-actions bot commented Aug 7, 2024

@Mergifyio backport branch-3.2

@github-actions github-actions bot removed the 3.2 label Aug 7, 2024
Copy link
Contributor

mergify bot commented Aug 7, 2024

backport branch-3.3

✅ Backports have been created

Copy link
Contributor

mergify bot commented Aug 7, 2024

backport branch-3.2

✅ Backports have been created

mergify bot pushed a commit that referenced this pull request Aug 7, 2024
…9054)

Signed-off-by: luohaha <18810541851@163.com>
(cherry picked from commit 3b682c7)

# Conflicts:
#	gensrc/proto/olap_file.proto
mergify bot pushed a commit that referenced this pull request Aug 7, 2024
…9054)

Signed-off-by: luohaha <18810541851@163.com>
(cherry picked from commit 3b682c7)

# Conflicts:
#	be/src/storage/rowset/rowset.h
#	be/src/storage/rowset_column_update_state.cpp
#	gensrc/proto/olap_file.proto
wanpengfei-git pushed a commit that referenced this pull request Aug 7, 2024
…ckport #49054) (#49478)

Signed-off-by: Yixin Luo <18810541851@163.com>
Co-authored-by: Yixin Luo <18810541851@163.com>
luohaha added a commit that referenced this pull request Aug 8, 2024
…ckport #49054) (#49477)

Signed-off-by: Yixin Luo <18810541851@163.com>
Co-authored-by: Yixin Luo <18810541851@163.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants