Batch acknowledge can lead to message being never acknowledged (until consumer restart) #7683
Description
Describe the bug
Issue can happen when acknowledge request fails due to temporary network failure. The root of the issue is that isDuplicate method depends on lastCumulativeAck field which is updated before the acknowledgment request succeeds and cumulativeAckFlushRequired is updated before that. Therefore on failure lastCumulativeAck will stay at messageId that wasn't actually acked and unacked messages will be rejected on redelivery as duplicates.
To Reproduce
Steps to reproduce the behavior:
let say we have 10 messages in a queue
- Do batch acknowledge for the last message (to acknowledge all)
- Temporary network failure happens and acknowledgement flush fails, while lastCumulativeAck becomes set to 10th message
- Consumer reconnects and all 10 messages are redelivered
- All 10 messages will be rejected as duplicates and won't be consumed again until consumer restart. On consumer restart they will be redelivered and reprocessed.
Expected behavior
10 messages will be reconsumed normally or even better - reacknowledged transparently without consuming by the application
Additional context
Idea:
As pulsar doesn't support acknowledge guarantees , this prevents having "effectively once delivery", since consumer is expected to process the messages that were unable to be acked twice. Transparent reacknowledgment of such messages on redelivery (for both normal and cumulative acknowledgment) can fix this issue for the most cases.