Skip to content

[BugFix] Fix MV refresh transaction rollback to preserve last successful state#70152

Open
LiShuMing wants to merge 1 commit intoStarRocks:mainfrom
LiShuMing:fix/main/fix_mv_refresh_bugs
Open

[BugFix] Fix MV refresh transaction rollback to preserve last successful state#70152
LiShuMing wants to merge 1 commit intoStarRocks:mainfrom
LiShuMing:fix/main/fix_mv_refresh_bugs

Conversation

@LiShuMing
Copy link
Contributor

@LiShuMing LiShuMing commented Mar 11, 2026

When an MV refresh fails, it was ending up with 0 rows instead of retaining the last successful refresh state. This happened because partitions were dropped before the insert was executed.

The fix defers partition drops until after the insert succeeds:

  1. Store partitions to be dropped in deferredDropPartitions
  2. Add new partitions during syncAddOrDropPartitions()
  3. After insert succeeds, call dropDeferredPartitions()
  4. If insert fails, original partitions with data are preserved

Changes:

  • MVPCTRefreshPartitioner: Add abstract dropDeferredPartitions() method
  • MVPCTRefreshListPartitioner: Implement deferred partition drop
  • MVPCTRefreshRangePartitioner: Implement deferred partition drop
  • MVPCTRefreshNonPartitioner: Add no-op implementation
  • MVPCTBasedRefreshProcessor: Call dropDeferredPartitions() after insert
  • PCTRefreshListPartitionOlapTest: Add test for the fix

Why I'm doing:

What I'm doing:

Fixes #issue

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • Parameter changes: default values, similar parameters but with different default values
  • Policy changes: use new policy to replace old one, functionality automatically enabled
  • Feature removed
  • Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function
    • This pr needs auto generate documentation
  • This is a backport pr

Bugfix cherry-pick branch check:

  • I have checked the version labels which the pr will be auto-backported to the target branch
    • 4.1
    • 4.0
    • 3.5
    • 3.4

…ful state

When an MV refresh fails, it was ending up with 0 rows instead of retaining
the last successful refresh state. This happened because partitions were
dropped before the insert was executed.

The fix defers partition drops until after the insert succeeds:
1. Store partitions to be dropped in deferredDropPartitions
2. Add new partitions during syncAddOrDropPartitions()
3. After insert succeeds, call dropDeferredPartitions()
4. If insert fails, original partitions with data are preserved

Changes:
- MVPCTRefreshPartitioner: Add abstract dropDeferredPartitions() method
- MVPCTRefreshListPartitioner: Implement deferred partition drop
- MVPCTRefreshRangePartitioner: Implement deferred partition drop
- MVPCTRefreshNonPartitioner: Add no-op implementation
- MVPCTBasedRefreshProcessor: Call dropDeferredPartitions() after insert
- PCTRefreshListPartitionOlapTest: Add test for the fix
@CelerData-Reviewer
Copy link

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 40092d0328

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

dropPartition(db, mv, mvPartitionName);
}
logger.info("The process of synchronizing materialized view delete partitions range [{}]", deletes);
this.deferredDropPartitions = deletes;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Drop replaced range partitions before adding new ones

This change defers all deletions, but range additions are still executed immediately, so addRangePartitions(...) runs while the old partitions are still present. RangePartitionDiffer.diffRange() computes add/delete entries by partition name, so when a partition boundary changes (common split/merge cases), the add side can target a name/range that still exists and addPartitions will fail due to duplicate/overlapping partition definitions. That turns valid refreshes into failures for range-partitioned MVs.

Useful? React with 👍 / 👎.

Comment on lines +184 to +186
} catch (Exception e) {
logger.warn("Failed to drop deferred partition {} for mv {}, skip",
mvPartitionName, mv.getName(), e);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Propagate deferred-drop failures instead of skipping

Swallowing exceptions here lets the task continue as success even when old partitions were not removed. execProcessExecPlan() always calls updateVersionMeta(...) after dropDeferredPartitions(), so a lock/error during dropPartition can leave stale partitions in place while metadata marks refresh complete, which can produce incorrect query results and prevents retry-based cleanup.

Useful? React with 👍 / 👎.

@github-actions
Copy link
Contributor

[Java-Extensions Incremental Coverage Report]

pass : 0 / 0 (0%)

@github-actions
Copy link
Contributor

[BE Incremental Coverage Report]

pass : 0 / 0 (0%)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants