-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[fix][meta] Use getChildrenFromStore
to read children data to avoid lost data
#24665
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #24665 +/- ##
============================================
+ Coverage 74.25% 74.26% +0.01%
+ Complexity 33174 32843 -331
============================================
Files 1884 1885 +1
Lines 146943 146963 +20
Branches 16882 16929 +47
============================================
+ Hits 109110 109143 +33
+ Misses 29163 29126 -37
- Partials 8670 8694 +24
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
...a/src/main/java/org/apache/pulsar/metadata/bookkeeper/AbstractHierarchicalLedgerManager.java
Outdated
Show resolved
Hide resolved
It's great that we are finally resolving this issue. Workaround for ZooKeeper, by-passing the Pulsar Metadata driver for BookKeeper client and BookKeeper server-side components: In pulsar-helm-chart this is configurable and implemented this way: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More changes are needed to resolve the issue since store.getChildren(path) will use a cache that is updated asynchronously.
Another problem could be the potential performance implications. We might need to make changes on BookKeeper side so that caching would be safe under certain conditions if good performance relies on caching. Many environments have become reliant on the high level of caching, at least this is my understanding of the matter.
getChildrenFromStore
to avoid lost data
getChildrenFromStore
to avoid lost datagetChildrenFromStore
to read children data to avoid lost data
getChildrenFromStore
to read children data to avoid lost datagetChildrenFromStore
to read children data to avoid lost data
pulsar-metadata/src/main/java/org/apache/pulsar/metadata/impl/ZKMetadataStore.java
Outdated
Show resolved
Hide resolved
pulsar-metadata/src/main/java/org/apache/pulsar/metadata/impl/ZKMetadataStore.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/pulsarbot run-failure-checks |
… lost data (apache#24665) (cherry picked from commit 90a70db) (cherry picked from commit e4431bc)
… lost data (apache#24665) (cherry picked from commit 90a70db)
… lost data (apache#24665) (cherry picked from commit 90a70db) (cherry picked from commit e4431bc)
… lost data (apache#24665) (cherry picked from commit 90a70db)
Motivation
We're facing data loss during an cluster upgrade. The issue manifests as three distinct error types post-upgrade: BadVersion errors, "No Such Ledger" errors, and ledger authentication failures. The root cause appears to be a metadata synchronization issue where the Pulsar metadata store implementation lacks proper sync calls before listing ledgers.
As bk side, there is a
sync
call beforegetChildren
https://github.com/apache/bookkeeper/blob/2789316c18e12cbb6d17fa4a023410dbad6593a0/bookkeeper-server/src/main/java/org/apache/bookkeeper/util/ZkUtils.java#L284-L311
Documentation
doc
doc-required
doc-not-needed
doc-complete