-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fix][broker] Geo Replication lost messages or frequently fails due to Deduplication is not appropriate for Geo-Replication #23697
Conversation
...roker/src/main/java/org/apache/pulsar/broker/service/persistent/GeoPersistentReplicator.java
Outdated
Show resolved
Hide resolved
pulsar-client/src/main/java/org/apache/pulsar/client/impl/GeoReplicationProducerImpl.java
Show resolved
Hide resolved
pulsar-client/src/main/java/org/apache/pulsar/client/impl/GeoReplicationProducerImpl.java
Outdated
Show resolved
Hide resolved
b9ae34a
to
0d2e235
Compare
/pulsarbot rerun-failure-checks |
5a2f2c7
to
cc44dc3
Compare
@poorbarcode It's a good idea to just use the ledger ID and entry ID for the message deduplication. In this case, we can also remove the deduplication state after the ledger get fully replicated. For example:
|
7d7d044
to
7c7dd34
Compare
@poorbarcode @gaoran10 @Technoboy- @codelipenghui Why could this PR be cherry-picked to branch-3.0 and branch-4.0? It changes the |
|
…o Deduplication is not appropriate for Geo-Replication (apache#23697) (cherry picked from commit 4ac4f3c) (cherry picked from commit 307b5c9)
…o Deduplication is not appropriate for Geo-Replication (apache#23697) (cherry picked from commit 4ac4f3c) (cherry picked from commit 26a211c)
…o Deduplication is not appropriate for Geo-Replication (apache#23697) (cherry picked from commit 4ac4f3c) (cherry picked from commit 307b5c9)
…ls due to Deduplication is not appropriate for Geo-Replication (apache#23697)" This reverts commit 2607a49.
…ntly fails due to Deduplication is not appropriate for Geo-Replication (apache#23697)"" This reverts commit a8a1c4a.
…o Deduplication is not appropriate for Geo-Replication (apache#23697) (cherry picked from commit 4ac4f3c) (cherry picked from commit 26a211c)
…o Deduplication is not appropriate for Geo-Replication (apache#23697) (cherry picked from commit 4ac4f3c) (cherry picked from commit 307b5c9)
…o Deduplication is not appropriate for Geo-Replication (apache#23697) (cherry picked from commit 4ac4f3c) (cherry picked from commit 307b5c9)
Motivation
Background
How does deduplication work?
{pendingMessages}
-1:-1
if the sequence ID published is lower than the previous messages.{pendingMessages}
is larger than the one that was rejected.{next} > {rejected}
: ignore the error, and continue work{next} < {rejected}
: close channels and reconnect.Conditions that issue happened
{pendingMessages}
withmessage.sequenceId
but ignoresmessage.original-producer-name
, which may cause the sequence-ids in{pendingMessages}
is not increasing-1:-1
send response will fail.Issue-1: loss messages
seq: 0
), M2(seq: 1
)seq: 0
), M4(seq: 1
){pendingMessages}
:[0,1]
{pendingMessages}
:[0,1,0,1]
seq 0, position 0:0
seq 1, position 0:1
seq 0, position -1: -1
seq 1, position -1:-1
{pendingMessages}
:[empty]
0
now).[M1, M2, M1, M2]
[M1, M2]
You can reproduce the issue by the test
testDeduplicationNotLostMessage
Issue-2: frequently fails
3:0
with sequence-id10
3:1
with sequence-id1
3:2
with sequence-id2
-0 Replicator copies messages
{pendingMessages}
:[10,1, 2]
3:0
successfully{pendingMessages}
:[1,2]
3:0
(a duplicated publishing)-1:-1
(new position relates to the latest publishing) for the latest send-response.failed-sequenced:10 > pendingMessages[0].sequenceId: 1
No test for reproducing this issue.
Modifications
Solution: replicators use a specified sequence ID(
ledegrId:entryId
of the original topic) instead of using the original producers’3:0
with sequence-id10
3:1
with sequence-id1
3:2
with sequence-id2
3:0
){pendingMessages}
:3:0, 3:1, 3:2]
3:0
successfully{pendingMessages}
:[3:1, 3:2]
3:0
(a duplicated publishing)-1:-1
(new position relates to the latest publishing) for the latest send-response.failed-sequenced(3:0) < pendingMessages[0].sequenceId(3:2)
Documentation
doc
doc-required
doc-not-needed
doc-complete
Matching PR in forked repository
PR in forked repository: x