Skip to content

fix(meta): unregister aborted initial job tables#25604

Open
Li0k wants to merge 2 commits intomainfrom
li0k/fix-ddl-abort-hummock-cleanup
Open

fix(meta): unregister aborted initial job tables#25604
Li0k wants to merge 2 commits intomainfrom
li0k/fix-ddl-abort-hummock-cleanup

Conversation

@Li0k
Copy link
Copy Markdown
Contributor

@Li0k Li0k commented May 9, 2026

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

This PR is scoped to DDL abort cleanup for creating streaming jobs. It does not change per-database dirty-job recovery cleanup; that path is fixed separately in #25592.

Bug:

  • When creating a streaming job fails, create_streaming_job_inner aborts the creating job with is_cancelled = false.
  • DROP/CANCEL of an Initial job also aborts through the same catalog helper.
  • Before this PR, these DDL abort callers could delete catalog rows without running the existing dropped-table cleanup path, so Hummock table ids that had already been registered could remain stale.
  • The real SQL/simulation repro uses per-database recovery only to make the foreground create fail. After the create fails, the stale-id cleanup responsibility is in the DDL abort path fixed here.

Fix:

  • Return an AbortCreatingStreamingJobResult from try_abort_creating_streaming_job, including the aborted job's state table ids.
  • Collect table ids from internal table catalogs, Fragment.state_table_ids, and the MV table catalog. Fragment.state_table_ids is needed because the SQL/simulation repro registers table ids through fragment metadata that are not fully represented by get_internal_tables_by_id.
  • Route DDL create failure and DROP/CANCEL of Initial jobs through cleanup_dropped_streaming_jobs, so Hummock unregister and Hummock/compactor notification happen through the existing dropped-table cleanup path.
  • Keep replace-job abort handling and live Background/Creating recovery unchanged.

Not covered by this PR:

Testing

  • cargo fmt --all -- --check
  • git diff --check
  • cargo check -p risingwave_meta --lib
  • cargo test -p risingwave_meta test_abort_initial_materialized_views_unregister_hummock --lib -- --nocapture
  • ./risedev sit-test test_per_db_recovery_abort_failed_foreground_ddl_unregisters_hummock_tables --no-capture

Checklist

  • I have written necessary rustdoc comments.
  • I have added necessary unit tests and integration tests.
  • I have added test labels as necessary.
  • I have added fuzzing tests or opened an issue to track them.
  • My PR contains breaking changes.
  • My PR changes performance-critical code, so I will run (micro) benchmarks and present the results.
  • I have checked the Release Timeline and Currently Supported Versions to determine which release branches I need to cherry-pick this PR into.

Documentation

  • My PR needs documentation updates.
Release note

Fixed stale Hummock table ids that can remain after aborting a creating streaming job in the DDL path.

@Li0k Li0k added the A-meta Area: Meta node. label May 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant