Skip to content

[test][broker] PIP-475: end-to-end migration tests + transition fixes#25878

Merged
lhotari merged 3 commits into
apache:masterfrom
merlimat:pip-475-impl-5
May 27, 2026
Merged

[test][broker] PIP-475: end-to-end migration tests + transition fixes#25878
lhotari merged 3 commits into
apache:masterfrom
merlimat:pip-475-impl-5

Conversation

@merlimat
Copy link
Copy Markdown
Contributor

Summary

Final implementation PR for PIP-475: Regular-to-Scalable Topic Migration. Builds on the synthetic-layout lookup (#25822), the V5 SDK support (#25850), and the migration command (#25875). Adds an end-to-end test of the full operator timeline against a live broker, and fixes three real bugs the test surfaced that prevented a connected V5 client from transitioning across the migration boundary.

Tests (V5MigrationEndToEndTest)

  • produce-through-migration: a V5 producer publishes via the synthetic layout (mod-N routing to the legacy segments); the topic is migrated while only that (marked) V5 producer is attached, so the pre-check passes without --force; the producer transparently follows the layout-change push to the real DAG and range-routes new messages to the active children; a V5 queue consumer then drains every pre- and post-migration message.
  • v4 lockout: after migration the old topic is terminated, so a legacy v4 producer can no longer write to it.

Fixes surfaced by the E2E test

  • DagWatchSession (the transition gap): compute segment:// URIs from the canonical topic:// name, not the session's raw input. A session opened with a persistent:// name (synthetic layout) previously threw "Parent topic must have domain 'topic'" inside buildResponse once the topic was migrated, so the real DAG was never pushed and connected clients never transitioned.
  • Migration pre-check: inspect per-partition stats instead of the aggregate. Aggregated partitioned stats merge publishers by name into fresh stat objects that drop per-connection metadata, which hid the V5-managed marker and made every V5 connection look like a legacy v4 one (so a fully-V5 topic was wrongly rejected).
  • ScalableTopicProducer: the send-retry now also covers per-segment producer creation (not just send()), since a migration terminates the old partition between routing and creation; detect the "segment gone" condition by unwrapping causes (type or message) and give the DAG-watch layout update a larger budget to arrive.

Test plan

  • V5MigrationEndToEndTest (2): produce-through-migration, v4 lockout.
  • Regression: ScalableTopicMigrationTest 7/7, DagWatchSessionTest 18/18, ScalableTopicControllerTest 35/35, V5RegularTopicInteropTest 4/4, full pulsar-client-v5 unit suite.

merlimat added 3 commits May 27, 2026 16:31
Adds V5MigrationEndToEndTest covering the full operator timeline against a
live broker, and fixes three real bugs the test surfaced that prevented a
connected V5 client from transitioning across the migration boundary.

Tests (V5MigrationEndToEndTest):
 * produce-through-migration: a V5 producer publishes via the synthetic
   layout, the topic is migrated while only that (marked) V5 producer is
   attached, the producer transparently follows the layout change to the
   real DAG, and a V5 queue consumer drains every pre- and post-migration
   message.
 * v4 lockout: after migration the old topic is terminated, so a legacy v4
   producer can no longer write to it.

Fixes:
 * DagWatchSession: compute segment:// URIs from the canonical topic://
   name, not the session's raw input. A session opened with a persistent://
   name (synthetic layout) previously threw "Parent topic must have domain
   'topic'" inside buildResponse when the topic was migrated, so the real
   DAG was never pushed and connected clients never transitioned.
 * ScalableTopics migration pre-check: inspect per-partition stats instead
   of the aggregate. Aggregated partitioned stats merge publishers by name
   into fresh stat objects that drop per-connection metadata, which hid the
   V5-managed marker and made every V5 connection look like a legacy v4 one.
 * ScalableTopicProducer: the send-retry now also covers per-segment
   producer *creation* (not just send()), since a migration terminates the
   old partition between routing and creation; detect the "segment gone"
   condition by unwrapping causes (type or message) and give the DAG-watch
   layout update a larger budget to arrive.
…tion

Adds TestScalableTopicMigration (tests/integration), which runs
`pulsar-admin scalable-topics migrate` against a real multi-broker
dockerized cluster: seed a partitioned regular topic via a v4 client,
migrate it via the admin CLI, assert the topic is now scalable
(get-metadata returns the segment DAG), and assert a legacy v4 producer
is locked out afterward (the old partitions are terminated). Registered
in the pulsar-messaging integration suite.

Complements the in-process V5MigrationEndToEndTest (which covers the
V5-client transparent transition); this validates the command, CLI
wiring, and termination in a real deployment with a real metadata store,
BookKeeper, and cross-broker bundle ownership.
Split the combined instanceof check into two single-line ifs so neither
line wraps an operator (OperatorWrap). No behaviour change.
Copy link
Copy Markdown
Member

@lhotari lhotari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lhotari lhotari merged commit 951a426 into apache:master May 27, 2026
43 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants