Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix] [broker] local metadata sync topic contains configuration events causing all operations stuck #22695

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

poorbarcode
Copy link
Contributor

@poorbarcode poorbarcode commented May 10, 2024

Motivation

Background:

  • PIP-136: Sync Pulsar policies across multiple clouds defines two topics below:

    • metadataSyncEventTopic: monitors local metadata store changes
    • configurationMetadataSyncEventTopic : monitors local metadata store changes
  • Local metadata store and Configuration metadata store share the same object in memory when their URLs are the same.

Issue 1

Since the event synchronizer is bound to the metadata store object in memory, the synchronizer receives all the events about the Local metadata store and Configuration metadata store when the two metadata stores are the same object in memory, the data in the two topics got mixed up.

Issue 2
The internal producer of the synchronizer relies on the SyncEventTopic; this topic relies on the namespace local policies; the operation of writing namespace local policies to ZK relies on the internal producer. A deadlock occurs. See the following flow:

  • Try to start the internal producer of the synchronizer.
  • Try to load the topic named metadataSyncEventTopic up.
  • Try to write data to the Local Metadata Store.
  • Try to send events to metadataSyncEventTopic before writing data to the Local Metadata Store.
  • The internal producer is starting now.
  • Stuck.....

You can reproduce this issue by the test SyncConfigStore1ZKPerClusterTest. testDynamicEnableConfigurationMetadataSyncEventTopic. This PR fixed the issue that the synchronizer got stuck due to two metadata stores relying on it. I will write a separate PR that skips syncing data that relies on the synchronizer itself.

Modifications

  • Create a separate configuration metadata store if users want to enable Metadata Synchronizer, even if the URL of the configuration metadata store is the same as the local metadata store.
  • Correct the behavior: metadataSyncEventTopic only receives the event about the local metadata store and configurationMetadataSyncEventTopic only receives the event about the configuration metadata store.
  • If the Broker has initialized itself with one metadata store, reject the dynamic config changes.
  • Add an optional choice mayEnableMetadataSynchronizer to let the Broker initialize itself with a separate configuration metadata store.

Next PRs

Skip to sync events that rely on the synchronizer itself. For example:

  • metadataSyncEventTopic is public/default/tp
  • (Highlight) Do not sync the events related to the topic public/default/tp, because the synchronizer relies on this topic 😂 , I will send a discussion for this change. See more details the Issue 2 in the section Motivation.

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository: x

Copy link

@poorbarcode Please add the following content to your PR description and select a checkbox:

- [ ] `doc` <!-- Your PR contains doc changes -->
- [ ] `doc-required` <!-- Your PR changes impact docs and you will update later -->
- [ ] `doc-not-needed` <!-- Your PR changes do not impact docs -->
- [ ] `doc-complete` <!-- Docs have been already added -->

@github-actions github-actions bot added doc-not-needed Your PR changes do not impact docs and removed doc-label-missing labels May 10, 2024
doc = "If you want to enable or disable the metadata synchronizer dynamically, this value should be true."
+ "Enabled: Pulsar will initialize itself to update the metadata synchronizer dynamically."
)
private boolean mayEnableMetadataSynchronizer = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can name the configuration as forceUseSeparatedConfigurationStore? Then we can use isConfigurationStoreSeparated to check the condition.

    public boolean isConfigurationStoreSeparated() {
        return !Objects.equals(getConfigurationMetadataStoreUrl(), getMetadataStoreUrl()) || forceUseSeparatedConfigurationStore;
    }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Renamed mayEnableMetadataSynchronizer to forceUseSeparatedConfigurationStoreInMemory
  • I did not change the method isConfigurationStoreSeparated, because the original behavior of this method is whether the URLs is different between Configuration Metadata Store and Local Metadata Store, we should not to change it.

@lhotari
Copy link
Member

lhotari commented Jul 29, 2024

@poorbarcode please resolve the merge conflict

@poorbarcode poorbarcode force-pushed the improve/one_zk_4_config_sync branch from 0eb3c04 to 8a59c50 Compare July 29, 2024 16:15
@poorbarcode
Copy link
Contributor Author

@lhotari

@poorbarcode please resolve the merge conflict

Done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc-not-needed Your PR changes do not impact docs ready-to-test release/2.11.5 release/3.0.10 release/3.3.4 release/4.0.1 type/bug The PR fixed a bug or issue reported a bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants